nickgkan / 3d_diffuser_actor

Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
https://3d-diffuser-actor.github.io/
MIT License
159 stars 16 forks source link

Instructions format #14

Closed dennisushi closed 3 months ago

dennisushi commented 3 months ago

Hi, I see the CALVIN instructions format differs from the RL bench one.

CALVIN has 'text' and 'embeddings' fields. Meanwhile the RL bench ones are Dict[Taskname, Dict[Number, Embeddings]]

I am guessing the Number corresponds to a particular variation or possible embedding - can you confirm which of the two it is? Or is it defining a sequence of instructions?

If a task has 5 possible captions, would those embeddings be separate entries within the dict, or would they be stacked in the final embeddings?

twke18 commented 3 months ago

Hi,

On RLBench, each variation can be described in different ways. The embeddings denotes a tensor of shape (num_descriptions, text_len, channel_dim). If you would like to know how the embeddings is computed, you can see this script for details. During training/testing, we randomly sample a description for each variation.

On CALVIN, we do not use the embeddings provided by the official repo, but compute the embeddings in a similar manner as RLBench. The official repo seems to apply global pooling on the features of each description, resulting in a tensor of shape (num_descriptions, 1, channel_dim). We do not apply global pooling and generate a tensor of shape (num_description, text_len, channel_dim). Only one description is given for each task during training and testing on CALVIN.

dennisushi commented 3 months ago

Thanks