Closed yj-Tang closed 3 months ago
Hi @yj-Tang,
thanks for pointing out this issue.
Could you provide me with an example script?
First of all, you can turn off the limit checking by setting the argument observation_checking
to false.
https://github.com/maxspahn/gym_envs_urdf/blob/4b80c3146624cafc56e01ed1158c5f5ab167bd54/urdfenvs/urdf_common/urdf_env.py#L64C1-L65C1
The velocity that you pass as action is the reference velocity that the physics engine's PID controller will try to match using torques as the underlying input, so you will never match the exact action as intended.
Best, Max.
Thanks for your reply. Here is the script, which is revised from https://github.com/maxspahn/gym_envs_urdf/blob/develop/examples/point_robot_reward.py
import numpy as np
from urdfenvs.urdf_common.urdf_env import UrdfEnv
from urdfenvs.robots.generic_urdf import GenericUrdfReacher
from urdfenvs.sensors.full_sensor import FullSensor
from urdfenvs.scene_examples.goal import goal1
from urdfenvs.urdf_common.reward import Reward
N_STEPS=1000
class InverseDistanceDenseReward(Reward):
def calculate_reward(self, observation: dict) -> float:
goal = observation['robot_0']['FullSensor']['goals'][1]['position']
position = observation['robot_0']['joint_state']['position']
return 1.0/np.linalg.norm(goal-position)
robots = [
GenericUrdfReacher(urdf="pointRobot.urdf", mode="vel"),
]
env= UrdfEnv(
dt=0.01,
robots=robots,
render=True,
)
env.add_goal(goal1)
# %%
sensor = FullSensor(['position'], ['position', 'size'], variance=0.0)
env.add_sensor(sensor, [0])
env.set_spaces()
env.set_reward_calculator(InverseDistanceDenseReward())
defaultAction = np.array([4.5, -4.0, 4.0])
pos0 = np.array([0.0, 0.1, 0.0])
vel0 = np.array([1.0, 0.0, 0.0])
# %%
ob, info = env.reset(pos=pos0, vel=vel0)
env.shuffle_goals()
print(f"Initial observation : {ob}")
gain_action = 1
history = []
for i in range(N_STEPS):
# Simple p-controller for goal reaching
# goal = ob['robot_0']['FullSensor']['goals'][1]['position']
# robot_position = ob['robot_0']['joint_state']['position']
# action = gain_action * (goal - robot_position)
clipped_action = np.clip(defaultAction, env.action_space.low, env.action_space.high)
ob, reward, terminated, truncated, info = env.step(clipped_action)
print(f"Reward : {reward}")
# In observations, information about obstacles is stored in ob['obstacleSensor']
history.append(ob)
print(f"Step {i} : {ob}")
if terminated or truncated:
break
env.close()
Thanks for pointing out "physics engine's PID controller will try to match using torques as the underlying input". My solution now is to clip the planned action to a slightly smaller value
That seems a reasonable approach, otherwise you can also rely on not checking the observation limits, or at least not breaking when one is encountered.
Can you close the issue if you are fine with my response?
Problem is solved, thank you!
Hi Max, I'm currently working on training a reinforcement learning policy using urdfenvs, and I've encountered an unexpected behavior. Here's what I've applied and observed:
I would expect the observed velocity to match the clipped input velocity. Is this the intended behavior? Or do you have any suggestions on how to keep the next observation in the observation limit? Looking forward to your reply. Thanks in advance.
Best regards, Yujie