Camera views adaptations

WaseemDR commented 6 months ago

Hi,

Thanks for sharing your work.

I'm wondering about a conceptual point: you process data from different setups, namely, multiple camera setups with varying locations relative to the robotic arms and from multiple point-views. To gain information between data of different setups shouldn't those camera streams be calibrated or transformed in regards to the arms (e.g. using extrinsic information)?

E.g. assuming we have the same task and the same setup but one time the camera is on the right of the arm and in another setup it is on the left of the arm. In this case, the camera inputs "hints" to the policy in opposite directions?

Thanks

kpertsch commented 6 months ago

Yeah currently we are not doing anything special to account for this -- if camera views change often for a single robot setup, the model needs to learn to implicitly register the current camera view with the robot base. That being said, most datasets in our training mix only have a single camera view or a small number, and their scenes are quite visually distinct, so it shouldn't be a problem for the model to figure out which scene + viewpoint it is in for a given training example.

WaseemDR commented 6 months ago

Got you thanks, Karl!

octo-models / octo

Camera views adaptations #87