robfiras / loco-mujoco

Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
MIT License
539 stars 46 forks source link

Need help in understanding the datasets! #33

Closed SidPad closed 3 months ago

SidPad commented 3 months ago

Hi, Great job on this work, and thank you for providing the code! Currently to train my model I am using your datasets that contain simulation data (observation and action).

I would like to ask about the datasets provided for the humanoids. For instance, I checked the contents of the dataset provided for the walking task for these humanoids: atlas, humanoid_muscle, humanoid_torque, talos, and unitree_h1. (eg.: loco-mujoco/loco-mujoco/loco_mujoco/datasets/humanoids/perfect/atlas_walk/perfect_expert_dataset_stoch.npz)

  1. The actions in these datasets fall outside the [-1.0, 1.0] range and do not seem that they are normalized to [-1.0, 1.0]. I would like to know how these actions were generated? I was wondering if they are direct outputs from the expert policy before normalization/clipping and sent to the environment, that's why they are out of bounds perhaps? Or probably my understanding of the action and observation spaces are wrong...

  2. Similarly, I would like to know how the observations are obtained for these datasets?

Edit: It would be great if you could tell me how the observations and actions are pre-processed in the environment.

Thank you for your help in advance! Sid

SidPad commented 3 months ago

Sorry, I did not update this earlier: I found that the dataset contains observations and actions that are not pre-processed and are raw simulation data. Also it seemst that the actions are normalized between -1,1 during training. However, when I check the environment action space for each robot (which is the unnormalized action space), the numbers do not make sense.

For instance the raw action data ranges between -0.5 to 4.0 for Unitree H1, but the action space is -0.95,0.95. Is there any explanation for this?

robfiras commented 3 months ago

Hi!

The actions in these datasets fall outside the [-1.0, 1.0] range and do not seem that they are normalized to [-1.0, 1.0]. I would like to know how these actions were generated? I was wondering if they are direct outputs from the expert policy before normalization/clipping and sent to the environment, that's why they are out of bounds perhaps? Or probably my understanding of the action and observation spaces are wrong...

Right now, there are two different types of datasets; real and perfect. Real datasets are datasets that consist of motion capture that is mapped to the respective robot embodiment (I guess that was clear for you already). Perfect datasets are generated by one of our imitation learning baselines using the real datasets (the configuration for each environment can be found in the examples). All actions for all environments are normalized between -1 and 1, and clipped by Mujoco afterwards. Unitree set the limits of the H1 to 0.95 instead, we might change this in future. We just recorded the actions of the policy, which can be out of bounds. We might think about clipping them in the future; in any case, it has no effect on the environmental side.

Similarly, I would like to know how the observations are obtained for these datasets?

The observations for the real datasets is just motion capture data mapped to the respective humanoid. For the perfect datasets, it is again recorded datasets from the environment interaction. There is no special preprocessing.

I hope this answers your questions! If not let me know.

SidPad commented 3 months ago

Thanks a lot for this clarification!