incognite-lab / myGym

myGym enables fast prototyping of RL in the area of robotic manipulation and navigation.You can train different robots, in several environments on various tasks. There is automatic evaluation and benchmark tool. From version 2.1 there is support for multi-step tasks, multi-reward training and multi-network architectures.
https://mygym.readthedocs.io/en/latest/
MIT License
49 stars 11 forks source link

Discrepancy between action vector and `joints_angles` observation #46

Closed Mcibula closed 6 months ago

Mcibula commented 7 months ago

Hello!

I hope I can post this a bit longer question here.

I am testing an integrated motor babbling control with -ct random argument for panda robot with following relevant config:

  "robot": "panda",
  "robot_action": "joints",
  "robot_init": [-0.4, 0.6, 0.5],
  "max_velocity": 10,
  "max_force": 100,
  "action_repeat": 1,

  "task_type": "reach",
  "task_objects": [
    {
      "init": {
        "obj_name": "cube_holes",
        "fixed": 0,
        "rand_rot": 0,
        "sampling_area": [-0.3, 0.3, 0.4, 0.6, 0.1, 0.1]
      },
      "goal": {
        "obj_name": "cube_holes",
        "fixed": 1,
        "rand_rot": 1,
        "sampling_area": [5, 5, 5, 5, 5, 5]
      }
    }
  ],
  "observation": {
    "actual_state": "obj_xyz",
    "goal_state": "obj_xyz",
    "additional_obs": ["joints_angles"]
  },

All I try to do is to print out a performed random action and subsequent observation from the environment in the timestep loop of test_env() function in test.py:

print(f"Action:{action}")
observation, reward, done, info = env.step(action)

print(f'Observation: {observation}')

If I understand it correctly, a sampled action in this case should be a new joint configuration, so the joints_angles segment of the subsequent observation vector should be almost the same as the action vector. It seems to work like this when using slider control:

Observation: [-1.10144916e-01  4.91167079e-01  7.49899258e-02  4.99990013e+00
  4.99999489e+00  5.00000000e+00  4.37333765e-01  9.72781516e-01
  1.87621376e-01 -1.22585735e-01  1.31177055e+00  1.44094002e+00
  2.36656491e-07  0.00000000e+00]
Action:[0.4373335838317871, 0.9727801084518433, 0.1876215934753418, -0.12258744239807129, 1.3117706775665283, 1.4409408569335938, 0.0]
Observation: [-1.10144915e-01  4.91167079e-01  7.49899302e-02  4.99990013e+00
  4.99999489e+00  5.00000000e+00  4.37333765e-01  9.72781516e-01
  1.87621376e-01 -1.22585735e-01  1.31177055e+00  1.44094002e+00
  2.36656491e-07  0.00000000e+00]`

However, when using random control, those values do not seem to correspond:

Action:[-0.51481533  1.6324749   2.1036975  -0.27616423  0.8946579   3.0306137
 -2.3106954 ]
Observation: [-0.08972217  0.40780999  0.09965184  5.00003808  4.99990754  5.
  0.41807567  0.9652326   0.22699737 -0.16194521  0.21542613  1.48197385
 -0.04166675  0.        ]
Action:[-2.3407836   0.99039567  2.1191826  -1.9024386  -1.879574    3.793361
 -1.682037  ]
Observation: [-0.08972217  0.40780999  0.0992242   5.00003808  4.99990754  5.
  0.39251881  0.96309263  0.27012038 -0.20360899  0.17760391  1.52364036
 -0.08333349  0.        ]
Action:[ 2.4431515 -1.5354003 -1.1559559 -1.2074044  2.5261626  3.121788
  1.876207 ]
Observation: [-0.08972217  0.40780999  0.09862862  5.00003808  4.99990754  5.
  0.37897825  0.95914598  0.22845184 -0.24527874  0.21926947  1.56530642
 -0.04166699  0.        ]

So I wanted to ask, whether I misunderstood what the action vector represents, or whether there is a problem causing this discrepancy.

Thank you very much.

michalvavrecka commented 7 months ago

Hi Miro. The problem is in randomness of the actions sampled. We use this mode only to test, whether all joints are moving. If you use them to control action, you will face the difference between PLAN and REALITY. Random parameter will sample any number to control joints, but s they are executed within one simulation step, the arm will move toward this direction based on speed and force. It is not able to reach distant positions within one step (physical limitation). You can use random parameter, but you need to increase speed and force in config: "max_velocity" :30, "max_force" :500,

if you want to reach even distant position in one step you have increase third parametr: "action_repeat" :20 it will run 20 simulation steps between each action to reach even distant goals.

But as I said, random is not good way to control robot. If you want motoric babbling, run the untrained network in step robot_control mode, it will guarantee you small increase in end effector position, reachable within one simulation step

Mcibula commented 7 months ago

Oh, I understand now; thank you.

gabinsane commented 6 months ago

Closing this issue, let us know if you need any more help