robfiras / loco-mujoco

Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
MIT License
552 stars 46 forks source link

Replay dataset using actions #38

Closed OliEfr closed 2 months ago

OliEfr commented 2 months ago

Hi, and thank you for this library - I hope to use it for research!

One question: is it possible to replay a dataset from its initial states and recorded actions? (From the source code, I assume its not possible by default. How could this be implemented then?)

Best, Oliver

robfiras commented 2 months ago

Hi Oliver, thanks for your interest! There is a way to do something like this; you need to specify the initial state while creating the environment. Here is an example for the G1:

import numpy as np

from loco_mujoco import LocoEnv

def experiment(seed=0):

    np.random.seed(seed)

    mdp = LocoEnv.make("UnitreeG1.run", random_start=False, init_step_no=0)    # here we set the inital state to be the first

    mdp.play_trajectory_from_velocity(n_episodes=3, n_steps_per_episode=500)

if __name__ == '__main__':
    experiment()

This will play the initial state of the first trajectory. I guess what you would like is to go through each trajectory one by one instead. This is not possible yet. There is going to be an update soon; I will keep this in mind.

OliEfr commented 2 months ago

Thanks a lot for your swift reply - I will try that. Two follow-ups:

1) Does this also work for the A1, as _init_sim_from_obs is not defined for this env? (I can also test this later for myself.)

2) Is there a specific reason for not including the actions in the Trajectory class? If that would be the case, it would be easy to retrieve the corresponding actions during replay.

EDIT: I can answer 1) myself after checking the code: _init_step_no is independent from _init_sim_from_obs.

Best, Oliver

robfiras commented 2 months ago

regarding 2.): I did not see the use case for adding actions to the trajectory class. The trajectory class is mainly used for handling motion capture data (e.g., for interpolation) and replaying the kinematics. What would be the use case for you?

OliEfr commented 2 months ago

This will play the initial state of the first trajectory. I guess what you would like is to go through each trajectory one by one instead. This is not possible yet. There is going to be an update soon; I will keep this in mind.

Are the trajectories shuffled during replay, or are they always played in order? If they are not shuffled, I could just replay the trajectory and look at env.create_dataset()["actions"] to get the corresponding actions.

regarding 2.): Basically, I want to perform a transformation on the recorded actions, and then replay using the actions to see if the agent still moves well. Also, I don't see a disadvantage in including the actions in the trajectory class. The definition of a trajectory under a MDP also includes the actions: https://en.wikipedia.org/wiki/Markov_decision_process#Definition. I will edit my clone to include the actions and can create a PR if you want.

robfiras commented 2 months ago

The trajectory choice is random when random_start is set to true, so yeah they are shuffled.

One important thing to note is that the replay is done without actions though. The replay only sets the kinematics of the trajectory to the robot's joint, without respecting dynamics and physics. This is done to cope with both, real and perfect dataset. The former only includes states from motion capture, without any actions.

So if you really want to replay by sending actions, you need to do it on your own. It is quite easy though. Just manually set the state you want and send actions.

OliEfr commented 2 months ago

One important thing to note is that the replay is done without actions though. The replay only set the kinematics of the trajectory to the robot, without respecting dynamics and physics.

I noticed. I understand the issue with the actions. I hope I find a solution for my (possibly very specific case) soon. It should be doable.

FYI: after working through the code I find it hard to follow sometimes. Mostly due to the observation reshaping, indexing, and so on. Maybe its more readable to work with dictionaries in the trajectories in the future.

Thanks for your help.

robfiras commented 2 months ago

FYI: after working through the code I find it hard to follow sometimes. Mostly due to the observation reshaping, indexing, and so on. Maybe its more readable to work with dictionaries in the trajectories in the future.

Yeah, I agree. We will push a big update soon, which will hopefully resolve this issue. That is especially a problem with the A1, which got the least support from us tbh

OliEfr commented 2 months ago

self._init_step_no evaluates to False when self._init_step_no=0. I think this is a bug. LoC

robfiras commented 2 months ago

Good point, need to be fixed.

Edit: elif self._init_step_no: needs to be changed to elif self._init_step_no is not None:. I can push the change tomorrow. Alternatively, you can also do a small PR, which I can merge.

OliEfr commented 2 months ago

Created PR #39

OliEfr commented 2 months ago

I still dont manage to reproduce the locomotion only using the recorded actions. Does the below seem feasible?


import gymnasium as gym
import numpy as np

from moviepy.editor import ImageSequenceClip

from loco_mujoco import LocoEnv
import loco_mujoco

np.random.seed(0)

env = gym.make("LocoMujoco", env_name="UnitreeA1.simple.perfect", render_mode="rgb_array", random_start=False, init_step_no=0)

expert_dataset = env.create_dataset()
expert_actions = expert_dataset["actions"]

imgs = []
env.reset()
imgs.append(env.render().transpose(1,0,2))
terminated = False
i = 0
j = 0

while j < 1000:
    if i == 1000 or terminated:
        env.reset()
        i = 0
    action = expert_actions[i, :]
    nstate, reward, terminated, truncated, info = env.step(action)

    imgs.append(env.render().transpose(1,0,2))
    i += 1
    j+=1
    print(j)

clip = ImageSequenceClip(imgs, fps=50)
clip.write_videofile("test.mp4", fps=50)

I would expect that given the inital state and the recorded action the movement should look similar to env.unwrapped.play_trajectory(n_steps_per_episode=250, n_episodes=1, render=True, record=True). But it doesn't. Am I missing something?

EDIT: updated script; fix typo; include rendering.

robfiras commented 2 months ago

That is actually the way I would have done it. Does is actually produce some wobbly gait or does it directly fall? If it is the former, it could be because of difference in the Mujoco Version (the one the dataset was recorded in vs the one you have); there is some similar issue in #34. If it directly falls, it is probably some other issue.

Thanks for the PR, I will check and merge soon!

OliEfr commented 2 months ago

It produced some gait. I can see that the actions correspond to some cyclic movement. Then, after a couple of seconds (I would say between 1-5) the agent falls. It feels like it could be something wrong in the initial state init, but can also be something else. pip show mujoco gives me v2.3.7. That' the same as in the project's requirements.txt - is it the correct one to be used?

robfiras commented 2 months ago

That's the one you should use, but the question is what version the data was collected on. There is definitely a problem with Mujoco >= 3.1.0. But it sounds like a problem with the initial state, tbh. I need to take a look at it.

OliEfr commented 2 months ago

Did you have a chance to look at it, or any suggestion on how I could replay a dataset using actions?

robfiras commented 2 months ago

Hi! sorry for the delay. So, I tried your script, and it works as expected. Here is a video:

https://github.com/user-attachments/assets/44eaeea3-b9e6-4a8c-9896-6cf3f5546cc1

It works with Mujoco v2.3.7. How does it look when you replay it? For Mujoco v3.1.6, it falls after a few steps.

OliEfr commented 2 months ago

I also run the script (and fixed the typo and updated it, see above). The agent falls for the first three trials, while it looks stable on the forth. I am not sure if this line action = expert_actions[i, :] is actually correct though, because I reset i=0 every time the agent falls. Did it for you walk on the first trial?

https://github.com/user-attachments/assets/768cc0e4-cbe3-457b-8e93-963da929f1a9

I assume that the agent walking on the fifth trial is rather a coincidence.

I checked pip show mujoco again and also get Version: 2.3.7.

OliEfr commented 2 months ago

Ah, are the correct datasets loaded? I have a feeling there might be a problem. Could you check if your first ten actions are equal? (Again, using the script from above)

print(expert_actions[:10])
array([[-0.11583757, -0.1605123 , -0.35786319,  0.04662561, -0.1451481 ,
         0.15527056, -0.07530443,  0.32525921,  0.51078105, -0.00260023,
        -0.05506159, -0.31670323],
       [-0.15076435, -0.13208956,  0.09038497, -0.09274339,  0.02133676,
         0.61076272,  0.10276344, -0.29225916,  0.61971581, -0.08077223,
        -0.2337113 , -0.01778385],
       [ 0.06037064, -0.36887422,  0.49953216, -0.00411046, -0.11493847,
         0.29007921, -0.00768534,  0.32712239,  0.47472784, -0.1221546 ,
        -0.2491878 ,  0.79734898],
       [ 0.04039627, -0.26430207,  0.3696959 , -0.10749903, -0.13194457,
         0.21554233, -0.03897424,  0.11035404,  0.07246473,  0.15576583,
        -0.32692078,  0.26622015],
       [-0.05259046, -0.19199839,  0.10475966, -0.12667641,  0.03968424,
         0.3608025 ,  0.06636741,  0.20039254,  0.39187399,  0.12702423,
        -0.08592802,  0.05238208],
       [-0.07197727, -0.05834942,  0.04420971, -0.11682422,  0.06008813,
         0.27044499,  0.02943962,  0.11740026,  0.31945133,  0.08024826,
        -0.06734565, -0.0183993 ],
       [-0.06263307, -0.05503454,  0.05221147, -0.08684146,  0.1023927 ,
         0.22788878,  0.06434353,  0.16745374,  0.3019456 ,  0.0021114 ,
        -0.09344653,  0.0289441 ],
       [-0.04804756, -0.07149791,  0.1122716 , -0.08323682,  0.07512225,
         0.18152034,  0.04562529,  0.1184231 ,  0.24178079, -0.00855026,
        -0.11994903,  0.10050282],
       [-0.03144503, -0.09150877,  0.17908812, -0.07251257,  0.06091456,
         0.20749609,  0.04530283,  0.10379429,  0.21179095, -0.03043696,
        -0.14734201,  0.18226424],
       [-0.03822419, -0.11518904,  0.22891074, -0.0720027 ,  0.03882153,
         0.20504069,  0.06411954,  0.07417977,  0.17670411, -0.04068176,
        -0.1608438 ,  0.24442369]])
robfiras commented 2 months ago

I have the feeling your resetting is not working correctly. Could you actually double-check that your code actually enters this condition with the corrections done in your PR, and that sample is actually always the same?

I am not sure if this line action = expert_actions[i, :] is actually correct though, because I reset i=0 every time the agent falls. Did it for you walk on the first trial?

That is actually correct. You are always replaying just the first trajectory as mentioned above.

The dataset look the same: [[-0.11583757 -0.1605123 -0.35786319 0.04662561 -0.1451481 0.15527056 -0.07530443 0.32525921 0.51078105 -0.00260023 -0.05506159 -0.31670323] [-0.15076435 -0.13208956 0.09038497 -0.09274339 0.02133676 0.61076272 0.10276344 -0.29225916 0.61971581 -0.08077223 -0.2337113 -0.01778385] [ 0.06037064 -0.36887422 0.49953216 -0.00411046 -0.11493847 0.29007921 -0.00768534 0.32712239 0.47472784 -0.1221546 -0.2491878 0.79734898] [ 0.04039627 -0.26430207 0.3696959 -0.10749903 -0.13194457 0.21554233 -0.03897424 0.11035404 0.07246473 0.15576583 -0.32692078 0.26622015] [-0.05259046 -0.19199839 0.10475966 -0.12667641 0.03968424 0.3608025 0.06636741 0.20039254 0.39187399 0.12702423 -0.08592802 0.05238208] [-0.07197727 -0.05834942 0.04420971 -0.11682422 0.06008813 0.27044499 0.02943962 0.11740026 0.31945133 0.08024826 -0.06734565 -0.0183993 ] [-0.06263307 -0.05503454 0.05221147 -0.08684146 0.1023927 0.22788878 0.06434353 0.16745374 0.3019456 0.0021114 -0.09344653 0.0289441 ] [-0.04804756 -0.07149791 0.1122716 -0.08323682 0.07512225 0.18152034 0.04562529 0.1184231 0.24178079 -0.00855026 -0.11994903 0.10050282] [-0.03144503 -0.09150877 0.17908812 -0.07251257 0.06091456 0.20749609 0.04530283 0.10379429 0.21179095 -0.03043696 -0.14734201 0.18226424] [-0.03822419 -0.11518904 0.22891074 -0.0720027 0.03882153 0.20504069 0.06411954 0.07417977 0.17670411 -0.04068176 -0.1608438 0.24442369]]

OliEfr commented 2 months ago

It didn't enter the condition. I forgot to change branches with the fix from self.trajectories is not None:

Thanks. Works now.

There is a typo in the subsequent line: https://github.com/robfiras/loco-mujoco/blob/136d79a3b571da93056b670a7e73c63a5c247b85/loco_mujoco/environments/quadrupeds/unitreeA1.py#L275 There is one "n" to much in n_traj = self.trajectories.nnumber_of_trajectories.

I changed it (locally) to n_traj = self.trajectories.number_of_trajectories and it works.

robfiras commented 2 months ago

awesome! Yeah, I noticed that typo. Would you mind adding this to your PR?

OliEfr commented 2 months ago

I added the "nnumbers" fix to my PR #39

I created an additional PR to add the script from above that replays actions as an example in #41 . I wasnt sure where to add that in the docs, so you could do that.

robfiras commented 2 months ago

great, many thanks for your contribution. Very much appreciated!

OliEfr commented 2 months ago

Many thanks for your help!