octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
https://octo-models.github.io/
MIT License
787 stars 152 forks source link

Lang-visual cross attention #83

Closed mees closed 4 months ago

mees commented 4 months ago

Adds language visual cross attention by adding tokenized language into an observation (time step) tokenizer. This allows the language to attend back and forth with the visual tokens at the same or prior time steps.