pliang279 / MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
MIT License
494 stars 71 forks source link

What's the meaning of modalities in MUJOCO PUSH dataset? #20

Open mrbeann opened 2 years ago

mrbeann commented 2 years ago

Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned

The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.

I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).

arav-agarwal2 commented 2 years ago

Someone else can confirm, but here's how I think of things: -> The "image" modality refers to the gray-scale images. -> The "pos" modality refers to the 3d position of the end-effector. -> The "sensor" refers to the forces/binary contact information. -> The "control" refers to what the controller is sending the arm itself. ( This one I'm the least sure about ).

mrbeann commented 2 years ago

I agree with your ideas, but this does not seem to correspond to the paper? For example, Figure 8.