robfiras / loco-mujoco

Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
MIT License
475 stars 38 forks source link

General errors in Unitree A1 env. #32

Open Danfoa opened 4 weeks ago

Danfoa commented 4 weeks ago

Dear @robfiras,

Thank you very much for your efforts in building this library. I wanted to point out several issues I found with the Unitree A1 environment, which might also be present in other environments.

  1. There is a mismatch between dataset observations and environment observations. Specifically, in the Unitree A1 environment, the heading orientation is returned with a -π/2 bias, which the dataset values do not have. This makes imitation learning impossible.
  2. The desired velocity is set at initialization to be the mean of the recorded trajectory motion velocity. However, the observations returned by the environment do not reflect this average value.

Additionally, I found it confusing for new users that the observation spec does not match the actual observation space. In other libraries, the observation spec typically serves as a data class to understand the dimensionality and semantic information of each dimension of the MDP state. In your custom use, it appears to be a placeholder for all physical observables from the system. Without documentation of this custom use, it is challenging to follow the codebase. I suggest simplifying or reducing the amount of pre-processing and post-processing of observations to prevent the issues mentioned above.

Thank you for your attention to these matters.

robfiras commented 4 weeks ago

Hi @Danfoa,

thanks a lot for the valuable feedback! I agree that the A1 environment did not get much love compared to the humanoids ...

Here some comments:

  1. the _modify_observation_callback is called when creating the observation, but also when creating the dataset. So both should be rotated. Training with the imitation learning scripts also worked for us.
  2. That's a good point, as of now the goal speed is set to 0.5 by default (which is roughly the mean vel of the trajectory). I will update this in the next release.

I can understand the confusion about the observation spec. The latter is mainly used to access information in the mujoco data structure. But not all information you want for the observation is in that datastructure. Often you want to add custom information like the goal or some custom foot forces. I tried to make this more clear in the documentation, did you find that helpful? In any case, there will be a major release soon, where I will try to make the observation space clearer in code as well.

Danfoa commented 4 weeks ago

@robfiras

I am running the tests now and I can confirm that the error in the angle is still present. I raise the issue because I can see the difference between sampling the state from the dataset, and the state returned by the environment after a reset (to the dataset initial state).

Also the velocity of the quadruped never exceeds .2 m/s, the average values being always less than .2 m/s.

robfiras commented 4 weeks ago

alright, I will take a closer look into it, which environment are you running, "simple" or "hard"?

Danfoa commented 4 weeks ago

@robfiras

Hard. By commenting out this -π/2 bias the issue with the angle is solved.

For the target velocity, it is unclear to me why you use the mean velocity, instead of the velocity error, or the actual target speed.

Danfoa commented 3 weeks ago

Hi @robfiras

I was wondering if the expert policy can be made public, such as the user can re-create a dataset and evaluate the performance of the policies in mildly distinct environment conditions.

robfiras commented 3 weeks ago

alright, I will check what's going on with that bias.

Yeah, publishing the policies as well (instead of just providing the training script) is on my todo, this might take a while to collect all policies though.