Image Tokenizers process the image stacked on iteself?

octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.

https://octo-models.github.io/

MIT License

885 stars 166 forks source link

Image Tokenizers process the image stacked on iteself? #90

Open andrearosasco opened 6 months ago

andrearosasco commented 6 months ago

As you can see in the picture posted here #42 the field task_stack_keys for the observation tokenizers appears to be the same as obs_stack_keys. This result in the model stacking the image onto itself before processing it. Why is this happening?

kpertsch commented 6 months ago

It's not the same image: the observation is the current time step's image, while the "task" is the goal image, ie the image from a randomly sampled future time step.

zwbx commented 5 months ago

Hi, I notice that there is image augmentation during training, will the goal image be also augmented like observation? if not, they are not matched spatially.