Open Cherryjingyao opened 3 months ago
I guess you refer to "vis" and "lang". We use the pytorch-lightning
CombinedLoader
which combines two individual datasets. https://github.com/mees/calvin/blob/main/calvin_models/calvin_agent/datasets/calvin_data_module.py#L121C32-L121C46
Hulc builds upon Calvin and thus uses the Calvin dataloader and dataset classes.
I suggest you read our papers and also refer to Lynch et al. paper Language Conditioned Imitation Learning over Unstructured Data, which introduces multi-context imitation learning, in order to understand why we train with two datasets. In short, we only label 1% of the sequences with language instructions and train the network with both visual and language goals.
In the Calvin Readme we explain how to generate different language embeddings: https://github.com/mees/calvin?tab=readme-ov-file#speech_balloon-relabeling-raw-language-annotations
I found the part of the batch data is "vim" ,and part of them is 'lang" .why is this setting? where is the data processing code? if I want to use another lang embedding of another language encoder ,how can i change the code? Thanks for your answering