Closed mees closed 4 months ago
Adds language visual cross attention by adding tokenized language into an observation (time step) tokenizer. This allows the language to attend back and forth with the visual tokens at the same or prior time steps.
Adds language visual cross attention by adding tokenized language into an observation (time step) tokenizer. This allows the language to attend back and forth with the visual tokens at the same or prior time steps.