Policy evaluation during training

real-stanford / diffusion_policy

[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion

https://diffusion-policy.cs.columbia.edu/

MIT License

1.1k stars 206 forks source link

Policy evaluation during training #22

Closed AnqiaoLi closed 8 months ago

AnqiaoLi commented 9 months ago

Thank you for the beautiful code!

In the evaluation of the training process, I see you have logged the train_action_mse_error, which samples the trajectories from the training set and calculates the error between the perdicted action and the target. Is there a specific reason why there isn't a corresponding validation_action_mse_error that calculates this error on the validation set?

cheng-chi commented 9 months ago

Hi @AnqiaoLi, Since I aws running simulation evaluation periodically with training, I didn't have need to log additional metrics for most of my experiments. The "train_action_mse_error" was logged as a sanity check, as it should be going down no matter what. Validation error for behavior cloning have been shown to not be very useful in robomimic. However, feel free to implement it yourself! It might be helpful for training real world polices! You might want to sample multiple actions and log the one with lowest error, since the action could come from multiple modes for each observation.