Closed dennisushi closed 3 months ago
Hi,
On RLBench, each variation can be described in different ways. The embeddings
denotes a tensor of shape (num_descriptions, text_len, channel_dim). If you would like to know how the embeddings
is computed, you can see this script for details. During training/testing, we randomly sample a description for each variation.
On CALVIN, we do not use the embeddings
provided by the official repo, but compute the embeddings
in a similar manner as RLBench. The official repo seems to apply global pooling on the features of each description, resulting in a tensor of shape (num_descriptions, 1, channel_dim). We do not apply global pooling and generate a tensor of shape (num_description, text_len, channel_dim). Only one description is given for each task during training and testing on CALVIN.
Thanks
Hi, I see the CALVIN instructions format differs from the RL bench one.
CALVIN has 'text' and 'embeddings' fields. Meanwhile the RL bench ones are Dict[Taskname, Dict[Number, Embeddings]]
I am guessing the Number corresponds to a particular variation or possible embedding - can you confirm which of the two it is? Or is it defining a sequence of instructions?
If a task has 5 possible captions, would those embeddings be separate entries within the dict, or would they be stacked in the final embeddings?