🏁 Making `urdfenvs` Work in Reinforcement Learning

behradkhadem commented 1 year ago

Hello Max, it's me again!

In reinforcement learning libraries, both info and observation data must be returned inside reset method of environment. Literally just changed one line. 😅 (And 9 tests failed 😂)

But there are further issues (using environments in RL) that I couldn't solve them yet. I'll submit them inside issues.

behradkhadem commented 1 year ago

I don't know why but I didn't get notification for your comment, and I saw it now. Sorry for delay in my answer. 🙏 So, to be clear, you want me to change unit tests accordingly?

maxspahn commented 1 year ago

I don't know why but I didn't get notification for your comment, and I saw it now. Sorry for delay in my answer. pray So, to be clear, you want me to change unit tests accordingly?

Yes, but this will be a breaking change so make sure to update the version, the second number in the version number should be bumped.

behradkhadem commented 1 year ago

Good idea. I'll try to make it work for stable baselines 3 and then we'll do the version bumping.

behradkhadem commented 1 year ago

Hello @maxspahn. I worked on the package and tried to troubleshoot stuff to make this package useful in reinforcement learning. And I think I made good progress today, but there is a lot more to do. Would you please run the point_robot_rl.py and see whether it works for you or not? I get this weird error and can't understand what causes it and what can I do. It seems to me that SB3 is the problem and not our package.

  File "/home/behradx/projects/gym_envs_urdf/examples/To Be Deleted/point_robot_rl.py", line 79, in <module>
    model.learn(total_timesteps=TIMESTEPS, 
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/td3/td3.py", line 216, in learn
    return super().learn(
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 330, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/td3/td3.py", line 169, in train
    next_q_values = th.cat(self.critic_target(replay_data.next_observations, next_actions), dim=1)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 934, in forward
    return tuple(q_net(qvalue_input) for q_net in self.q_networks)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 934, in <genexpr>
    return tuple(q_net(qvalue_input) for q_net in self.q_networks)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype

And I know that tests failed and will solve them (and even write new tests) after I made this work for RL purposes.

maxspahn commented 1 year ago

@behradkhadem,

Some questions first: Which version of stable baselines is used? Could you add it to the pyproject.toml file as an optional depdency? Why do we need to downgrade the gym version?

In the meantime I have tried to figure out a bit, but the installation of stable_baselines3 is a pain.

behradkhadem commented 1 year ago

@behradkhadem,

Some questions first: Which version of stable baselines is used? Could you add it to the pyproject.toml file as an optional depdency? Why do we need to downgrade the gym version?

In the meantime I have tried to figure out a bit, but the installation of stable_baselines3 is a pain.

I'm using version 1.8.0 of stable baselines 3. Version 2.0.0 and higher use gymnasium instead of gym.

I downgraded because newer versions of gym had breaking changes. For example, the step method returned 5 variables instead of 4 and this was causing problems. I'll jump on it after fixing issues related to SB3.

And installing stable baselines3 was easy as far as I remember. I recommend using a new environment for SB3 (I prefer conda) and installing it with extras: pip install stable-baselines3[extra].

behradkhadem commented 1 year ago

It seems like environments are fine and problem lies behind stable baselines 3. These are the things that I checked:

check_env runs without a hassle.
action space and observation space have the same dtype.
Using gym.spaces Dict instead of python (vanilla) dict.

Honestly everything seems fine. I could run the RL example using PPO algorithm, but we have the aforementioned error while using DDPG, TD3, SAC. I'll try using RLLib instead of Stable Baselines3 tomorrow.

behradkhadem commented 1 year ago

Update: After tedious installation of Ray Rllib, I couldn't run it successfully either. I can't pinpoint the source of the issue. I'll continue working on it.

behradkhadem commented 1 year ago

Hello @maxspahn, It's me again! Sorry for the delay, I couldn't work on the package due to some health issues.

I've been working on the package in past week and here's a summary of what I've understood:

The environments have no issues and env_checker method has no error when running, but only for version v0.21 and not for v0.26. And the reason is changes in step and reset methods. Even the example environments don't run the way they should, and I don't know why tests didn't fail.
The error of stable baselines3 is not because of the package, it originates from stable baselines3 itself (source)! For some reason not using float32 for action space causes errors inside package.
gym is deprecated. If you want to make this package future proof (not just for RL) you have to migrate from OpenAI gym to Gymnasium. There are some backward compatibility features inside gymnasium (for all versions of gym) but they are not perfect. At the other hand, migrating from gym to gymnasium is not as easy as it seems, it required changing data types from Dict and etc to gymnasium.spaces.Space and so on. There are many breaking changes, but the output at the end is 🤌.

Whether you merge this PR or not is your call, but I'll continue working on the package and running it for RL purposes. Right now, my first priority is using this package (as it is) inside SB3 (using gymnasium backward compatibility features). But if you want, I can assist you in migrating to gymnasium and so on.

PS: I think test failed because I reverted the version of gym. I couldn't update poetry.lock file.

maxspahn commented 1 year ago

@behradkhadem ,

Great to hear about your progress and all the insides you got!

The environments have no issues and env_checker method has no error when running, but only for version v0.21 and not for v0.26. And the reason is changes in step and reset methods. Even the example environments don't run the way they should, and I don't know why tests didn't fail.

I am not a big fan of downgrading dependencies, so I prefer to not merge this PR, but rather create a specific branch for that on the upstream repository.

The error of stable baselines3 is not because of the package, it originates from stable baselines3 itself (source)! For some reason not using float32 for action space causes errors inside package.

I am suprised, but great that you found that out.

gym is deprecated. If you want to make this package future proof (not just for RL) you have to migrate from OpenAI gym to Gymnasium. There are some backward compatibility features inside gymnasium (for all versions of gym) but they are not perfect. At the other hand, migrating from gym to gymnasium is not as easy as it seems, it required changing data types from Dict and etc to gymnasium.spaces.Space and so on. There are many breaking changes, but the output at the end is.

I knew this day would come rather soon, when I saw that the guys from gym moved away from OpenAI, but here we are. I created a new issue, see #192, for that. I hope I'll find some time during the summer season for that. Maybe it is less work than expected. Also, I am open for help on that one.

behradkhadem commented 1 year ago

I am suprised, but great that you found that out.

It worked! It finally worked! I swear, I had tear in my eyes when that error was gone. The funny part was that I was able to make the envs work in gym v0.21! I'll work on upgrading the gym to v0.26.

For now, I think we can maneuver on reward shaping and algorithms. I'd be happy if you test the package and wrapper yourself.

behradkhadem commented 1 year ago

Update: We can't use gym v0.26 in stable baselines3 apparently. We should update to gymnasium. (source)

behradkhadem commented 1 year ago

@maxspahn Could you test it?

behradkhadem commented 1 year ago

Closed in favor of https://github.com/maxspahn/gym_envs_urdf/pull/196.

maxspahn / gym_envs_urdf

🏁 Making `urdfenvs` Work in Reinforcement Learning #190