jr-robotics / robo-gym

An open source toolkit for Distributed Deep Reinforcement Learning on real and simulated robots.
https://sites.google.com/view/robo-gym
MIT License
390 stars 74 forks source link

Issue running td3_script.py for UR robot using stable_baseline3 #58

Closed f4rh4ng closed 1 year ago

f4rh4ng commented 2 years ago

Hello, first of all thanks for making this awesome toolkit. I am trying to run td3_script.py example on my system using following code:

import gym
import robo_gym
from robo_gym.wrappers.exception_handling import ExceptionHandling
from stable_baselines3 import TD3
from stable_baselines3.td3.policies import MlpPolicy

target_machine_ip = '127.0.0.1' # or other xxx.xxx.xxx.xxx

env = gym.make('EndEffectorPositioningURSim-v0', ip=target_machine_ip, gui=True)

env = ExceptionHandling(env)

model = TD3(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=15000)

model.save('td3_ur_basic')

It works perfectly fine with the MIR robot, but while running it with UR bot i am getting the following error while trying to learn the model in the first iteration:

Traceback (most recent call last):
  File "/home/mypc/robo-gym/docs/examples/stable-baselines/td3_script.py", line 17, in <module>
    model.learn(total_timesteps=15000)
  File "/home/mypc/.local/lib/python3.8/site-packages/stable_baselines3/td3/td3.py", line 205, in learn
    return super(TD3, self).learn(
  File "/home/mypc/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    rollout = self.collect_rollouts(
  File "/home/mypc/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 580, in collect_rollouts
    new_obs, rewards, dones, infos = env.step(actions)
  File "/home/mypc/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 162, in step
    return self.step_wait()
  File "/home/mypc/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 51, in step_wait
    return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 205, in _deepcopy_list
    append(deepcopy(a, memo))
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'google.protobuf.pyext._message.ScalarMapContainer' object

Do you have any idea how can i fix the issue? Could it be because of stable_baseline3?

f4rh4ng commented 2 years ago

SOLVED: If anybody else have came across this problem, I resolved this issue by replacingdeepcopy(self.buf_infos) in "/home/mypc/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py" with an empty dictionary [{}].

Info buffer replies a dictionary with additional (debug) information about the environment(here about joint states). But there is a problem with deep-copying here (probably because of the format of the received message). As it doesn't affect the training process it could be ignored.

kmh8667 commented 2 years ago

Hi, I got the same problem and solved through your solution.

but, I got one question. when replace dictionary to empty, console dosen't show 'ep_len_mean' and 'ep_rew_mean'

and there are no reward record in log file, so I wonder how to check the reward ??

I want to show reward graph using tensorboard, and there are no reward record when dictionary is empty.

qwer [This is what happened replace dictionary to empty]

asdf [Before replace dictionary to empty]

f4rh4ng commented 2 years ago

Hi, I got the same problem and solved through your solution.

but, I got one question. when replace dictionary to empty, console dosen't show 'ep_len_mean' and 'ep_rew_mean'

and there are no reward record in log file, so I wonder how to check the reward ??

I want to show reward graph using tensorboard, and there are no reward record when dictionary is empty.

qwer [This is what happened replace dictionary to empty]

asdf [Before replace dictionary to empty]

Hi, you are right. Back then i was unfamiliar with the code and removing the info helped me to get some runs! :)

for making the info log run correctly, you can first reverse the changes that i have mentioned above. And then in the "robo_gym/envs/ur/ur_base_env.py" file, set the "rs_state_to_info" to false. Then it will work correctly.

When "rs_state_to_info" is set to True, it is trying to add a copy of self.rs_state to the "info" log. It looks something like this:

[{'rs_state': {'wrist_3_joint_velocity': 0.0007305238395929337, 'base_joint_velocity': 0.0003346684679854661, 'wrist_1_joint_velocity': 8.450639143120497e-05, 'ee_to_ref_translation_x': -0.09074004739522934, 'wrist_2_joint_velocity': 1.6347812561434694e-05, 'elbow_joint_position': 1.5003688335418701, 'shoulder_joint_velocity': 0.0001309403742197901, 'ee_to_ref_translation_y': 0.12343478202819824, 'wrist_2_joint_position': -1.3962695598602295, 'ee_to_ref_rotation_y': -0.6927606463432312, 'ee_to_ref_rotation_x': 0.6576900482177734, 'wrist_1_joint_position': 0.002039114013314247, 'in_collision': 0.0, 'elbow_joint_velocity': -0.00011047261068597436, 'ee_to_ref_translation_z': 0.554492712020874, 'ee_to_ref_rotation_z': -0.2597506642341614, 'wrist_3_joint_position': 1.8221845721200225e-06, 'shoulder_joint_position': -2.495415449142456, 'ee_to_ref_rotation_w': 0.14161935448646545, 'base_joint_position': 6.575046427315101e-05}}]

The problem is with the type of self.rs_state that can not be understood by the deepcopy operation.

However if you don't need this logs you can do as said above.

Btw if you could fix the issue with the type of rs_state, it would be nice if you could share it here. :)

kmh8667 commented 2 years ago

Thank you for your explain :+1:

I reverse the changes, and then "rs_state_to_info" set to False.

But there are same type error occured

As you said rs_state_to_info makes problem, so I removed "self.rs_state_to_info" in 'def step'.

Then, it works very well :)

If I find a solution how to fix the issue with rs_state, I'll comment here.

Screenshot from 2022-06-30 21-25-31 Screenshot from 2022-06-30 21-31-28

jr-b-reiterer commented 7 months ago

Thanks y'all for the research on this so far. Although the issue is closed, I'm leaving this here for future reference:

The root problem is that the state_dict returned via gRPC is in fact stored in a protobuf ScalarMap. The deepcopy stumbles over some extra fields in this type. The clean fix should be something like: rs_state=dict(rs_state) See also this ScalarMap issue.

Alternatively, if you want to work around it by excluding the rs_state from the info, you don't need to alter the step method. There is a flag that you can add to the gym.make arguments: rs_state_to_info=False

MiR environments are not affected because they use the state array instead of the state_dict.