Closed l1xiao closed 1 year ago
Hi @l1xiao,
The official driving metrics is implemented in an easy to use environment wrapper format. See smarts.env.gymnasium.wrappers.metric.metrics.Metrics.
As you rightfully acknowledged, the metrics of a trained model can be computed by running the benchmark. See how to run the benchmark.
Note that the metrics are to be only computed at the end of an episode. The metrics are incomplete when queried in the middle of an episode and thus unusable.
An env under test can be wrapped with the Metrics
wrapper. See how the env is wrapped inside the benchmark code. At the end of the episode, the env can be queried for its raw performance records. See how to query. Using the raw performance records, final weighted scores and agent scores can be obtained. Refer here and here.
Considering the above points, we are unable to compute the metrics during training because the training is done based on number of steps and not number of episodes.
We may compute the metrics during the intermittent evaluation callbacks, by using a modified EvalCallback
. Here, we should wrap the env_eval
parameter with the Metrics
wrapper. Inside the modified EvalCallback
function, we should query the env for its raw performance records, i.e., env.records()
, in order to compute the metrics at the end of the episodes.
In the interest of keeping the example RL code simple, easy to understand by newbies, and efficient to maintain, an evaluation callback with embedded metrics is not made available currently in the examples section. Interested users may add the metric computation inside evaluation callbacks by following the above steps.
We hope the above is helpful.
Thanks for your reply! I'll try to implement an EvalCallback.
High Level Description
Hello, during the training process of Driving SMARTS 2023.1(example/e10_drive), I want to know whether the model has changed in terms of task completion rate (number of successes to reach goal / number of all trajectories), rather than just monitoring reward. I have read the related code and found that I can get whether a trajectory is successful (reach the goal) in
Reward.py,
or register aSuccessRateCallback
in run.py and listen to whether it is successful at_on_rollout_end().
But I found problems with both attempts:is_done
andreach_goal
at each_on_step()
, I can't pass the information out, and I don't know clearly the number of trajectories(when to start and when to end).SuccessRateCallback
(similar toEvaluationCallback
) in run.py is that an episode has multiple trajectories, and this Callback is only called at_on_rollout_end()
, so the statistical results are incorrect.I found that it is indeed possible to get the overall score by running the benchmark to approximate the completion rate, but this seems to be less efficient. Is there a way to judge the main goal achievement of the model during the training process? Where is it better to implement this function? Or do you monitor the task completion rate during eval in your usual training process? If so, is there example code in the repository?
Version
v1.4.0
Operating System
Ubuntu 18.04
Problems
No response