octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
https://octo-models.github.io/
MIT License
865 stars 165 forks source link

Question about action space #63

Closed zwbx closed 7 months ago

zwbx commented 8 months ago

Octo's action space comprises end-effector velocities, representing changes in ['x', 'y', 'z', 'yaw', 'pitch', 'roll', 'grasp']. I intend to assess the model's zero-shot capability in a simulator. Despite understanding the significant domain gap, my goal is to verify the pipeline's error-free operation. I'm utilizing RLBench, where I observed that the action space is defined by joint angles, differing from Octo's.

For questions:

  1. Regarding zero-shot, should I compute the corresponding joint angles from Octo's outputs using inverse kinematics, ensuring model-environment action alignment? Is this correct?

  2. Regarding few-shot, should the action space in demonstration be end-effector velocities rather than joint angles? I understand joint angles are directly observable, whereas end-effector velocities necessitate computational conversion.

Thanks for your great work!

kpertsch commented 8 months ago
  1. Yes, inverse kinematics would be the way to go (though note that we are not conditioning Octo on action space definition, so 0-shot performance will likely be bad since the model is not familiar with the action space definition in RL bench)
  2. Either way works -- you can finetune Octo with end-effector velocities, which is closer to the training data, but would require you to compute the target velocities for your training data; or you can reinitialize the action head during finetuning and then directly finetune to predict joint angle actions.
zwbx commented 7 months ago

Thanks!