octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
https://octo-models.github.io/
MIT License
735 stars 139 forks source link

Rising validation loss #37

Open lyshuga opened 7 months ago

lyshuga commented 7 months ago

Hi,

Using scripts/finetuning.py, I have been trying to train models with different manually collected dataset on pick/pick_and_place tasks, but the problem with rising validation MSE loss occurred. To simplify the experiments (mentioned in #29) I have collected a rather simple dataset of picking one object from the same location with no object rotation, so most of them look more or less the same. But the problem is that the validation MSE loss is still rising without any real reason. The train and validation are split like 50 and 7 trajectories accordingly.

image

So, I was wondering if you have any idea what could be the problem here?

Maybe, adding MSE of diffusion policy (used in train loss) could show a better picture on validation?

kpertsch commented 7 months ago

Thanks for your post! Have you actually tried running the policy on your robot? Validation MSE is unfortunately usually not a great indicator for rollout performance and it's quite common for policies to have increasing validation MSE but still improve in terms of real world rollout performance. It may also be worth trying a few checkpoints along the way to see whether their behavior tends to get better or worse over training.

zwbx commented 4 months ago

Hello, is there any material or potential explanation regarding why Validation is not a good indicator?

kpertsch commented 4 months ago

It's an empirical observation we've made over and over (not just for Octo). It seems that policies that show classic signs of overfitting (ie increasing validation loss) work better when rolled out on the robot. One explanation can be that the validation loss metric does not capture accumulating errors, ie the fact that the robot policy can quickly go out of distribution of the training data when rolled out on the real robot.

zwbx commented 4 months ago

That is a good explanation..Thank you.

It's an empirical observation we've made over and over (not just for Octo)

By this, do you mean the the observation can also be obtained from other imitation learning methods? I am kind of wondering if there is anyone to study this.

kpertsch commented 4 months ago

We are working on an upcoming project where we will empirically show that MSE is a bad metric for choosing checkpoints, but for now I am not sure whether there's any other publication that explicitly states this. But we have observed this with the RT models at Google, Octo and some ongoing works.

bdelhaisse commented 2 months ago

@zwbx @kpertsch There is this paper "What Matters in Learning from Offline Human Demonstrations for Robot Manipulation", 2021: see (C4) in Section 2 and Figure 5 (a).