octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
https://octo-models.github.io/
MIT License
787 stars 152 forks source link

Initializing action heads in finetune_config file #95

Closed karthikm-0 closed 4 months ago

karthikm-0 commented 4 months ago

Thanks for the great model! I am trying to finetune a Franka Panda on a small dataset using Octo. I've had some success modifying the Aloha example with my own dataset. However, I noticed that the hyperparameters are different compared to the finetune_config provided when using the advanced finetuning approach (which seems to be using the params from the paper). However, since my action space is different, I'd like to reinitialize the action head but don't see such an option in the config file. Would someone be able to suggest a way to include this in the config? Thanks!

kpertsch commented 4 months ago

If you change the action dim in the action head the corresponding new params would automatically get initialized from scratch (eg the output layer of the action head). You can check the function that does the param copying from pre-trained checkpoint to finetuning init here: https://github.com/octo-models/octo/blob/main/octo/utils/train_utils.py#L382

If you want to re-init the whole action head, not just the params that change shape, you can either change the name of the action head or hack the function above to include a "skip_keys" argument that manually excludes keys that match a certain pattern from the keys_to_update, similar to how we implement frozen_keys in the optimizer: https://github.com/octo-models/octo/blob/cab7f94b4db2dd93063d9c7f3482360743e22ec7/octo/utils/train_utils.py#L237

karthikm-0 commented 4 months ago

Thanks Karl! On a slightly different note, I had a question about learning resets. My training trajectories currently include a pick, place, and then a retreat (slightly different each time since it comes from a VR controller). Is this a reasonable way to learn the end of an episode?

kpertsch commented 4 months ago

If you train with goal image conditioning no problem. If you train language conditioned it may be hard for the model to guess which retreat-trajectory it should predict, but it should probably still work fine if it's not required to be very precise in the retreat trajectory, so I would just go ahead and try it with your current setup.

karthikm-0 commented 4 months ago

Excellent! I'm using goal image conditioning. Thanks!