robfiras / loco-mujoco

Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
MIT License
539 stars 46 forks source link

Custom reward RL #15

Closed lambdavi closed 6 months ago

lambdavi commented 7 months ago

Hello. Great work done with the simulator. However, I am encountering a problem in using the environment with custom reward function. I installed loco_mujoco through pip.

from stable_baselines3 import PPO, DDPG, SAC
from stable_baselines3.common.env_util import make_vec_env
import numpy as np
from loco_mujoco import LocoEnv
import gymnasium as gym
import torch

# define what ever reward function you want
def my_reward_function(state, action, next_state):
    return -np.mean(action)     # here we just return the negative mean of the action

def make_env():
    return gym.make("LocoMujoco", env_name="UnitreeA1.simple", reward_type="custom",
               reward_params=dict(reward_callback=my_reward_function))

Following your documentation I encounter this error:

TypeError: loco_mujoco.environments.quadrupeds.unitreeA1.UnitreeA1() got multiple values for keyword argument 'reward_type' was raised from the environment creator for LocoMujoco with kwargs ({'env_name': 'UnitreeA1.simple', 'reward_type': 'custom', 'reward_params': {'reward_callback': <function my_reward_function at 0x104e27d90>}})

It looks like I can't override the reward function. I don't know if you have any suggestions in this sense or if you encountered this problem as well and you know an easy fix.

robfiras commented 7 months ago

Hi @lambdavi, there is a known issue with the Unitree A1 environment not allowing custom reward functions yet. I will add this feature today, and push it to the master branch. I would ask you to switch to an editable installation until the next release, which is going to happen soon! Sorry for the inconvenience, and thanks for reporting this! I will notify you once added.

lambdavi commented 7 months ago

You probably know it already, but that same behaviour is extended to other envs, such as Unitree H1. I will switch to the editable version. I am going to check out better the code for the simulation, if you need help in developing some not critical feature, just let me know.

Thanks for the prompt reply and help!

robfiras commented 7 months ago

Yeah, with the latest release, there was a problem with the custom rewards for other robots as well. But that was fixed already, it is just not released yet. The only one missing is the Unitree A1, which requires a few more changes. Thanks for being willing to help out!

robfiras commented 6 months ago

Sorry for the delay, this has been fixed now!