robfiras / loco-mujoco

Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
MIT License
527 stars 43 forks source link

Cannot replay actions from HumanoidTorque4Ages dataset #44

Open SidPad opened 4 weeks ago

SidPad commented 4 weeks ago

Hi, I replayed the actions from the perfect datasets for the humanoids: Atlas, Humanoid Muscle, Humanoid Torque (adult), Talos, and Unitree H1 and all these humanoids can walk. But replaying the actions from the dataset for HumanoidTorque4Ages do not seem to work for any of the four humanoids.

To replay the actions from the dataset, I load the datasets, take the actions, and replay them step by step in the LocoEnv, for instance:

env = LocoEnv.make("HumanoidTorque4Ages.walk.1.perfect")
initial_i = 0 #np.random.randint(0, 1000)
action_dim = env.info.action_space.shape[0]
_, init_st, _ = get_prompt(initial_i, initial_i)
env.reset(init_st)
env.render()
absorbing = False
while i < 500:
        act, _, rec_st = get_prompt(initial_i + i, initial_i)
        if i < 500:
            action = act
        obs, _, absorbing, _ = env.step(action)

        env.render()
        i += 1

def get_prompt(counter, init_i):
    raw_data = np.load('/home/spyd66/loco-mujoco/loco-mujoco/loco_mujoco/datasets/humanoids/perfect/humanoid4ages_torque_walk/HumanoidTorque4Ages_walk_stochastic_dataset_4.npz')
    state = raw_data['states']
    action = raw_data['actions']
    return action[counter], state[init_i], state[counter]

Am I calling the wrong environment/dataset? I am not sure if there is a mismatch between environment and the dataset (although I did try all the possible combinations to see if there is a mismatch between the four humanoids, but still failed).

I would also like to ask, do the number 1-4 in these file names (HumanoidTorque4Ages_walk_stochastic_dataset_xx.npz) denote the humanoid size? Thanks for your help in advance!

SidPad commented 3 weeks ago

Hi, I made it work by making few modifications in the code: base_humanoid_4_ages.py

def setup(self, obs):
        """
        Function to setup the initial state of the simulation. Initialization can be done either
        randomly, from a certain initial, or from the default initial state of the model. If random
        is chosen, a trajectory is sampled based on the current model.

        Args:
            obs (np.array): Observation to initialize the environment from;

        """

        self._reward_function.reset_state()

        # if obs is not None:
        #     raise TypeError("Initializing the environment from an observation is "
        #                     "not allowed in this environment.")
        # else:
        if not self.trajectories and self._random_start:
            raise ValueError("Random start not possible without trajectory data.")
        elif not self.trajectories and self._init_step_no is not None:
            raise ValueError("Setting an initial step is not possible without trajectory data.")
        elif self._init_step_no is not None and self._random_start:
            raise ValueError("Either use a random start or set an initial step, not both.")

        self._random_start = 0                           ##### I re-initialized these variables #####
        self._init_step_no = 1

        if self.trajectories is not None:
            if self._random_start:
                if self._scaling_trajectory_map:
                    curr_model = self._current_model_idx
                    valid_traj_range = self._scaling_trajectory_map[curr_model]
                    traj_no = np.random.randint(valid_traj_range[0], valid_traj_range[1])
                    sample = self.trajectories.reset_trajectory(traj_no=traj_no)
                else:
                    sample = self.trajectories.reset_trajectory()
            elif self._init_step_no:
                traj_len = self.trajectories.trajectory_length
                n_traj = self.trajectories.number_of_trajectories
                assert self._init_step_no <= traj_len * n_traj
                substep_no = int(self._init_step_no % traj_len)
                traj_no = int(self._init_step_no / traj_len)
                sample = self.trajectories.reset_trajectory(substep_no, traj_no)
            self.set_sim_state(sample)

However, for the environments 'HumanoidTorque4Ages.walk.3/4.perfect', the humanoid falls after a couple of steps. The MuJoCo version used is 2.3.7.