Evaluating ML-Agents Soccer Twos Model

Purpose of Evaluation

Evaluation serves to:

Assess the performance of your trained model in realistic gameplay scenarios.
Compare different training runs or algorithms.
Determine if the model is ready for deployment or needs further training.

Setting Up Evaluation Matches

Disable Training Mode

When evaluating, you want to run the model in inference mode (no learning occurs). This is done using the --no-train flag:

mlagents-learn config.yaml --run-id=ppo_soccer_twos --env=build/SoccerTwos --no-train

Use Multiple Environment Instances

To get more robust results, it's often beneficial to run multiple instances of the environment simultaneously:

mlagents-learn config.yaml --run-id=ppo_soccer_twos --env=build/SoccerTwos --num-envs=10 --no-train

This runs 10 simultaneous matches, providing more data points for evaluation.

Disable Graphics (Optional)

For faster evaluation, especially when running multiple instances, you can disable graphics rendering:

mlagents-learn config.yaml --run-id=ppo_soccer_twos --env=build/SoccerTwos --num-envs=10 --no-graphics --no-train

Logging Evaluation Results

Built-in Metrics

ML-Agents automatically logs some metrics during evaluation, such as cumulative reward. These can be viewed in TensorBoard.

Custom Metrics

For more specific evaluation metrics (e.g., win rate, goals scored), you'll need to implement custom logging:

a. Modify your Unity environment to track these metrics. b. Use the Unity SideChannel to send this data to Python. c. In your Python script, receive this data and log it using TensorBoard or a custom logging solution.

Example of logging a custom metric:

import numpy as np
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
from mlagents_envs.side_channel.stats_side_channel import StatsSideChannel

def evaluate_soccer_twos(env_path, num_episodes=100):
    # Set up channels
    engine_configuration_channel = EngineConfigurationChannel()
    stats_channel = StatsSideChannel()

    # Create and use the environment
    env = UnityEnvironment(
        file_name=env_path,
        side_channels=[engine_configuration_channel, stats_channel],
        no_graphics=True,
        worker_id=0
    )

    # Configure the environment
    engine_configuration_channel.set_configuration_parameters(time_scale=20.0)

    env.reset()
    behavior_name = list(env.behavior_specs)[0]  # Get the behavior name

    # Initialize metrics
    wins = 0
    total_goals_scored = 0
    total_goals_conceded = 0
    total_possession_time = 0
    successful_passes = 0
    total_passes = 0

    for _ in range(num_episodes):
        # Simulate the environment and track metrics
        _, reward, done, _ = env.step(0)  # Replace with your own logic to simulate the environment
        if reward == 1:  # Replace with your own logic to detect a win
            wins += 1
        total_goals_scored += reward  # Replace with your own logic to calculate goals scored
        total_goals_conceded += -reward  # Replace with your own logic to calculate goals conceded
        total_possession_time += env.reset()[3]  # Replace with your own logic to track possession time
        successful_passes += 1  # Replace with your own logic to count successful passes
        total_passes += env.step(0)[8]  # Replace with your own logic to count total passes

    # Log the results
    print(f"Win Rate: {wins / num_episodes * 100}%")
    print(f"Average Goals Scored: {total_goals_scored / num_episodes}")
    print(f"Average Goals Conceded: {total_goals_conceded / num_episodes}")
    print(f"Average Possession Time: {total_possession_time / num_episodes} seconds")
    print(f"Pass Accuracy: {successful_passes / total_passes * 100}%")

if __name__ == "__main__":
    env_path = "path/to/your/SoccerTwos_build"  # Update this with your actual build path
    evaluate_soccer_twos(env_path, num_episodes=100)

Analyzing Evaluation Results

Quantitative Analysis:

Win rate: Percentage of matches won.
Goals scored/conceded: Average per match.
Possession time: How long the agent controls the ball.
Pass accuracy: Successful passes vs. total attempts.

Qualitative Analysis:

Watch gameplay videos to assess strategy and behavior.
Look for emergent behaviors or unexpected strategies.

Comparative Analysis:

Compare performance against baseline models or previous versions.
Evaluate against different opponent strategies.

Iterative Improvement

Based on evaluation results:

Identify weaknesses in the model's performance.
Adjust training parameters or reward structure.
Retrain the model with improvements.
Re-evaluate to measure the impact of changes.

Remember, evaluation is an iterative process. You may need to go through several cycles of training, evaluation, and adjustment to achieve the desired performance in the Soccer Twos environment.

huypham37 / AIML-UM-14