maxspahn / gym_envs_urdf

URDF environments for gym
https://maxspahn.github.io/gym_envs_urdf/
GNU General Public License v3.0
43 stars 14 forks source link

Position is a 9 parameter array #201

Closed behradkhadem closed 1 year ago

behradkhadem commented 1 year ago

Hello @maxspahn. I'm trying to use to use the panda robot to perform a pick and place. I need the position of the end-effector for creating the reward function. The problem is that the "position" is a 3 by 1 array, but what we have in this package is a 9 by 1 array. Why is that? At first, I thought that it was a quaternion and the first 3 params stand for the position and the other six are for representing the orientation. But when I test it the data that I get from first three numbers in the array don't correspond to the movement of the robot. Take this code as an example:

from urdfenvs.robots.generic_urdf import GenericUrdfReacher
from urdfenvs.sensors.full_sensor import FullSensor
from urdfenvs.urdf_common import UrdfEnv
from urdfenvs.urdf_common.reward import Reward
from mpscenes.obstacles.box_obstacle import BoxObstacle

import numpy as np
import os

class DistanceReward(Reward):
    def calculate_reward(self, observation: dict) -> float:

        endeffector_position = observation['robot_0']['joint_state']['position'][:3] # first three params of the position array.
        print(f"🦾 End-effector Position: {endeffector_position}")
        obstacle_position = observation["robot_0"]["FullSensor"]["obstacles"][1]["position"]
        print(f'⬛ Obstacle Position: {obstacle_position}')
        reward = 0.0
        return reward

movable_obstacle_dict = {
'type': 'box',
'geometry': {
    'position' : [0.5, 0.5, 0.0],
    'width': 0.04,
    'height': 0.04,
    'length': 0.1,
},
'movable': True,
'high': {
        'position' : [5.0, 5.0, 1.0],
    'width': 0.35,
    'height': 0.35,
    'length': 0.35,
},
'low': {
    'position' : [0.0, 0.0, 0.5],
    'width': 0.2,
    'height': 0.2,
    'length': 0.2,
}
}
movable_obstacle = BoxObstacle(name="movable_box", content_dict=movable_obstacle_dict)

robots = [
    GenericUrdfReacher(urdf="panda_with_gripper.urdf", mode="vel"),
]

env = UrdfEnv(render=True, robots=robots)

sensor = FullSensor(['position'], ['position', 'size'], variance=0.0)
env.add_sensor(sensor, [0])
env.add_obstacle(movable_obstacle)
env.set_reward_calculator(DistanceReward())

env.set_spaces()
ob, *_ = env.reset()

# You can interact inside the physics simulator window with the robot and move the arm itself. 
env.reset()
while True:
    action = env.action_space.sample()
    obs, reward, done, truncated, info = env.step(action)

And also, there is a minuscule movement in the position of the obstacle when we don't interact with it. Why this happens?

maxspahn commented 1 year ago

The problem is that the "position" is a 3 by 1 array, but what we have in this package is a 9 by 1 array.

The joint state position is the configuration of the robot, 7 degrees of freedom + 2 fingers. You need to use a forward kinematics to compute the end effector position. I have a package for doing that.

maxspahn commented 1 year ago

And also, there is a minuscule movement in the position of the obstacle when we don't interact with it. Why this happens?

This is most lightly coming from numerical oscillation in pybullet itself

behradkhadem commented 1 year ago

Ummm, honestly the forwardKinematics is weird. 😅 I wrestled with it a bit I'm not sure whether I did things correctly or not. Can you help me extract the position of the arm? This is the code that I used:

from urdfenvs.robots.generic_urdf import GenericUrdfReacher
from urdfenvs.sensors.full_sensor import FullSensor
from urdfenvs.urdf_common import UrdfEnv
from urdfenvs.urdf_common.reward import Reward
from mpscenes.obstacles.box_obstacle import BoxObstacle

import numpy as np
import os

import casadi as ca
from forwardkinematics.urdfFks.pandaFk import PandaFk

class DistanceReward(Reward):
    def calculate_reward(self, observation: dict) -> float:

        endeffector_position_raw = observation['robot_0']['joint_state']['position'][:7]
        fk_panda = PandaFk()
        fk_numpy = fk_panda.fk(endeffector_position_raw, 2, positionOnly=False)
        endeffector_position = np.array([fk_numpy[0][3], fk_numpy[1][3], fk_numpy[2][3]])
        print(f"""🦾 End-effector Position: {endeffector_position}
              fk_numpy: {fk_numpy}
              """)
        obstacle_position = observation["robot_0"]["FullSensor"]["obstacles"][1]["position"]
        print(f'⬛ Obstacle Position: {obstacle_position}')
        reward = 0.0
        return reward

movable_obstacle_dict = {
'type': 'box',
'geometry': {
    'position' : [0.5, 0.5, 0.0],
    'width': 0.04,
    'height': 0.04,
    'length': 0.1,
},
'movable': True,
'high': {
        'position' : [5.0, 5.0, 1.0],
    'width': 0.35,
    'height': 0.35,
    'length': 0.35,
},
'low': {
    'position' : [0.0, 0.0, 0.5],
    'width': 0.2,
    'height': 0.2,
    'length': 0.2,
}
}
movable_obstacle = BoxObstacle(name="movable_box", content_dict=movable_obstacle_dict)

robots = [
    GenericUrdfReacher(urdf="panda_with_gripper.urdf", mode="vel"),
]

env = UrdfEnv(render=True, robots=robots)

sensor = FullSensor(['position'], ['position', 'size'], variance=0.0)
env.add_sensor(sensor, [0])
env.add_obstacle(movable_obstacle)
env.set_reward_calculator(DistanceReward())

env.set_spaces()
ob, *_ = env.reset()

env.reset()
while True:
    action = env.action_space.sample()
    obs, reward, done, truncated, info = env.step(action)

Is the endeffector_position correctly returning the value? I thought the fk_numpy is a Denavit-Hartenberg matrix and calculated based on that.

@maxspahn

behradkhadem commented 1 year ago

Dear @maxspahn, I'm still waiting to hear back from you. I'm looking forward to hearing your thoughts and comments.

Thanks.

maxspahn commented 1 year ago

Hi @behradkhadem ,

I was on holidays/conference. Your script seems fine. Why are you doubting it?

I don't know what you mean by "Denavit-Hartenberg matrix".

BTW: If you set the argument positionOnly to True, it only returns the position.

behradkhadem commented 1 year ago

Why are you doubting it?

It doesn't seem right to me. When I hold the end-effector and move it across the output values of position doesn't seem right. It looks like a randomly generated number. Did you test the code above? I'm hesitant about these lines:

    endeffector_position_raw = observation['robot_0']['joint_state']['position'][:7]
    fk_panda = PandaFk()
    fk_numpy = fk_panda.fk(endeffector_position_raw, 2, positionOnly=False)
    endeffector_position = np.array([fk_numpy[0][3], fk_numpy[1][3], fk_numpy[2][3]])
    print(f"""🦾 End-effector Position: {endeffector_position}
          fk_numpy: {fk_numpy}
          """)

BTW: If you set the argument positionOnly to True, it only returns the position.

Where?

Thanks for your help.

maxspahn commented 1 year ago

This segmant is indeed slightly wrong. The second argument of fk needs to be the link you want to compute the fk for, for the panda it is 7. You can also see the change to the argument positionOnly. Note that the observation you get is the robot's configuration, the robot state.

joint_position = observation['robot_0']['joint_state']['position'][:7]
fk_panda = PandaFk()
ee_position = fk_panda.fk(joint_position, 7, positionOnly=True)
print(f"End-effector Position: {ee_position}")

Let me know if this helps.

behradkhadem commented 1 year ago

Let me know if this helps.

Thanks, I think I'm getting the correct values now!

Note that the observation you get is the robot's configuration, the robot state.

I didn't understand what you meant. Configuration means the properties (like mass, shape, etc.) but state means variables that change overtime (like position, velocity etc.). Which one is it?

And also, what is control mode? I see that there are 3 different control modes (tor, acc, vel which I think correspond to torque, acceleration, velocity). When I change the control mode from vel to tor, the robot seems to move better. Are these used in changing the way the input is being applied to robot actuators (from PyBullet) or do they have different use cases? What happens if I use tor instead of vel?

Many thanks.

maxspahn commented 1 year ago

I didn't understand what you meant. Configuration means the properties (like mass, shape, etc.) but state means variables that change overtime (like position, velocity etc.). Which one is it?

Usually, the robot configuration is referred to as the current positions of the joints. Mass, shape, geometry, dda. are usually referred to as robot parameters, or system parameters. The state of the robot is defined as the current joint positions and joint velocities, hence the name 'joint_states' in the observation dictionary.

The different control modes are, exactly as you understood it, the diferent commands you can send to the actuators. So, tor will send torques to the individual joints.