[Help Request] Add success rate metric during training.

l1xiao commented 1 year ago

High Level Description

Hello, during the training process of Driving SMARTS 2023.1(example/e10_drive), I want to know whether the model has changed in terms of task completion rate (number of successes to reach goal / number of all trajectories), rather than just monitoring reward. I have read the related code and found that I can get whether a trajectory is successful (reach the goal) in Reward.py, or register a SuccessRateCallback in run.py and listen to whether it is successful at _on_rollout_end(). But I found problems with both attempts:

The problem with implementing in Reward.py is that although I know the events of is_done and reach_goal at each _on_step(), I can't pass the information out, and I don't know clearly the number of trajectories(when to start and when to end).
The problem with using a custom SuccessRateCallback(similar to EvaluationCallback) in run.py is that an episode has multiple trajectories, and this Callback is only called at _on_rollout_end(), so the statistical results are incorrect.

I found that it is indeed possible to get the overall score by running the benchmark to approximate the completion rate, but this seems to be less efficient. Is there a way to judge the main goal achievement of the model during the training process? Where is it better to implement this function? Or do you monitor the task completion rate during eval in your usual training process? If so, is there example code in the repository?

Version

v1.4.0

Operating System

Ubuntu 18.04

Problems

No response

Adaickalavan commented 1 year ago

Hi @l1xiao,

The official driving metrics is implemented in an easy to use environment wrapper format. See smarts.env.gymnasium.wrappers.metric.metrics.Metrics.
As you rightfully acknowledged, the metrics of a trained model can be computed by running the benchmark. See how to run the benchmark.
Note that the metrics are to be only computed at the end of an episode. The metrics are incomplete when queried in the middle of an episode and thus unusable.
An env under test can be wrapped with the Metrics wrapper. See how the env is wrapped inside the benchmark code. At the end of the episode, the env can be queried for its raw performance records. See how to query. Using the raw performance records, final weighted scores and agent scores can be obtained. Refer here and here.
Considering the above points, we are unable to compute the metrics during training because the training is done based on number of steps and not number of episodes.
We may compute the metrics during the intermittent evaluation callbacks, by using a modified EvalCallback. Here, we should wrap the env_eval parameter with the Metrics wrapper. Inside the modified EvalCallback function, we should query the env for its raw performance records, i.e., env.records(), in order to compute the metrics at the end of the episodes.
In the interest of keeping the example RL code simple, easy to understand by newbies, and efficient to maintain, an evaluation callback with embedded metrics is not made available currently in the examples section. Interested users may add the metric computation inside evaluation callbacks by following the above steps.
We hope the above is helpful.

l1xiao commented 1 year ago

Thanks for your reply! I'll try to implement an EvalCallback.

huawei-noah / SMARTS