FetchReach: from what does observation exactly consist?

openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.

https://www.gymlibrary.dev

Other

34.53k stars 8.59k forks source link

FetchReach: from what does observation exactly consist? #1503

Closed TomasMerva closed 5 years ago

TomasMerva commented 5 years ago

I would like to know what exactly observation of Fetch environments represents (position of what, etc) and also what types of actions are possible. How could I find these information in others robotics environments?

Thank you for your help.

Santosh-16k commented 5 years ago

To find what the observation consists of you can go to the code where the environment is described. As in your example, it is fetch robot. So, information related to fetch env can be found at: https://github.com/openai/gym/blob/master/gym/envs/robotics/fetch_env.py

When you go through the code you'll find on line 112,

obs = np.concatenate([ grip_pos, object_pos.ravel(), object_rel_pos.ravel(), gripper_state, object_rot.ravel(), object_velp.ravel(), object_velr.ravel(), grip_velp, gripper_vel, ])

You can see what observation consist of. Although, it would be better if you read the entire method _get_obs(), starting from line 87. For FetchReach, following is contained in observation.

grip_pos - Position of the gripper given in 3 positional elements and 4 rotational elements
object_pos.ravel - Not applicable for reach task. It is an array of zeros for reach task
object_rel_pos.ravel - Not applicable for reach task. It is an array of zeros for reach task
gripper_state - The quantity to measure the opening of gripper
object_rot.ravel - Not applicable for reach task. It is an array of zeros for reach task
object_velp.ravel - Not applicable for reach task. It is an array of zeros for reach task
object_velr.ravel - Not applicable for reach task. It is an array of zeros for reach task
grip_velp - The velocity of gripper moving
gripper_vel - The velocity of gripper opening/closing

For any other environment, the information for observation can be found at https://github.com/openai/gym/blob/master/gym/envs you have to understand the _get_obs() function.

TomasMerva commented 5 years ago

Thank you for your help. However, I am still quite confused. Despite the fact that the task "Reach" does not use for example _objectpos.ravel what physical parameter should it represent? I am asking about the meaning of those shortcuts of parameters.

What is the difference between this types of object position? object_pos.ravel - ??? object_rel_pos.ravel - ???

gripper_state - Is this like 0/1 signal which represents closed/opened gripper?
object_rot.ravel - I suggest that this is rotation of the object, right?

I do not know what that index "p" and "r" should mean object_velp.ravel - ??? object_velr.ravel - ???

PS: is ".ravel" only some function, which meaning I do not need to know?

Thank you a lot

Santosh-16k commented 5 years ago

Those parameters are relevant when you have an object in the environment. E.g: Pick and place task. So, when the object is in the environment, following is what the parameters mean:

object_pos - Position of the object with respect to the world frame
object_rel_pos - Position of the object relative to the gripper as it can be seen in line 101 of fetch_env.py object_rel_pos = object_pos - grip_pos
gripper_state - No. It's not 0/1 signal, it is a float value and varies from 0 to 0.2 for fetch robot. This varied gripper state helps in grasping different sized object with different strengths.
object_rot - Yes. That is the orientation of the object with respect to world frame.

The index 'p' and 'r' means positional and rotational respectively

object_velp - Positional velocity of the object with respect to the world frame
object_velr - Rotational velocity of the object with respect to the world frame

The function ravel is numpy function. It converts numpy arrays in 1D array. More information about ravel can be found here

asdint commented 5 years ago

nice！！！

RyanRizzo96 commented 4 years ago

Thanks @Santosh-16k, but this still doesn't make complete sense to me. I must be missing something.

In FetchReach-v1, why do we need gripper_vel which is the velocity of the gripper opening/closing? The gripper doesn't even need to open/close.

Why do we needgripper_statewhich is the quantity to measure the opening of the gripper. I would think that these values are important in environments like FetchPickAndPlace.

I present a summary of observations for FetchReach-v1 below. The number in brackets is how many values are returned.

Cartesian position of gripper: grip_pos (3) The quantity to measure the opening of gripper: gripper_state (2) The velocity of gripper moving: grip_velp (3) The velocity of gripper opening/closing: gripper_vel (2)

Also, I don't understand why we get 3 printed values for grip_velp and only 2 for gripper_vel

corcasta commented 3 years ago

I'm new to RL, i implemented DDPG+HER for FetchReach and it worked, i just want to know what is the meaning of my agent's output (the action in other words) as i understood from the paper it represents the desired gripper movement in Cartesian Coordinates + the opening/closing of the gripper.

Why the environment needs only this data as input?
Does the environment has a model inside that translates "gripper movement" into torque for the actuators or something else? 2-b. How it knows which joints to move correctly to satisfy the goal?
What is the difference between implementing RL rather than Inverse Kinematics? (i know this gets overly complicated and RL is kind of a general formula)

I will truly appreciate for the help

TomasMerva commented 3 years ago

@RyanRizzo96 In case of the FetchReach environment, gripper_vel is always zero so just ignore it. As you said, you do not need it and because its value is not changing it does not have any effect on a neural network/learning performance. grip_velp is 3D because it consists of a linear velocity of the gripper in each axis (so v_x, v_y, v_z)

@ocortina Inverse kinematics is a method that only computes positions of each joint of a robotic arm based on [X,Y,Z] + orientation of the end-effector (gripper). It wont compute a trajectory for getting from the point A to the point B, only position of joints based on specific gripper's pose. For moving from one point to another you need to use some kind of motion planner.

Regarding motion planning, one of the possibility is to use RL alongside other standard robotics methods, as it is the case of the Fetch environments, or to use "end-to-end policy".

In terms of the Fetch environments, the RL serves as a high-level control that is ONLY added to the standard robotics control pipeline. The RL agent only guides the gripper's position (its orientation is fixed) by generating a new [X,Y,Z] point each 40ms (at least in the original paper) based on a current state of the environment. Thereafter this point is sent to the classical robotics motion planner that takes care of planning a trajectory from current position to the desired one (new point), trajectory execution and then a IK solver then low-level control of motors etc...

I do not want to propagate my work but I implemented SAC+HER for solving my custom robotics environments, here is a link where it is maybe better explained. In my case the RL part was also only added to the classical robotics pipeline. https://youtu.be/GN2U0PE8QBk

corcasta commented 3 years ago

@TomasMerva Thanks a lot, your comment solved many of my doubts. I'm an undergraduate so this may sounds dumb again.

According to the following definitions:

Task planning — Designing a set of high-level goals, such as “go pick up the object in front of you”, requires generating the shortest path for the robot from a given starting point to a target point while satisfying the spatial constraints.
Path planning — Generating a feasible path from a start point to a goal point. A path usually consists of a set of connected waypoints. Requires generating the shortest path for the robot from a given starting point to a target point while satisfying the spatial constraints.
Trajectory planning — Generating a time schedule for how to follow a path given constraints such as position, velocity, and acceleration.
Trajectory following — Once the entire trajectory is planned, there needs to be a control system that can execute the trajectory in a sufficiently accurate manner.

Is the agent (DDPG+HER) seen through the robotics context as a Path Planning algorithm?

Thanks again