Closed behradkhadem closed 1 year ago
I don't know why but I didn't get notification for your comment, and I saw it now. Sorry for delay in my answer. 🙏 So, to be clear, you want me to change unit tests accordingly?
I don't know why but I didn't get notification for your comment, and I saw it now. Sorry for delay in my answer. pray So, to be clear, you want me to change unit tests accordingly?
Yes, but this will be a breaking change so make sure to update the version, the second number in the version number should be bumped.
Good idea. I'll try to make it work for stable baselines 3 and then we'll do the version bumping.
Hello @maxspahn.
I worked on the package and tried to troubleshoot stuff to make this package useful in reinforcement learning. And I think I made good progress today, but there is a lot more to do.
Would you please run the point_robot_rl.py
and see whether it works for you or not? I get this weird error and can't understand what causes it and what can I do. It seems to me that SB3 is the problem and not our package.
File "/home/behradx/projects/gym_envs_urdf/examples/To Be Deleted/point_robot_rl.py", line 79, in <module>
model.learn(total_timesteps=TIMESTEPS,
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/td3/td3.py", line 216, in learn
return super().learn(
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 330, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/td3/td3.py", line 169, in train
next_q_values = th.cat(self.critic_target(replay_data.next_observations, next_actions), dim=1)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 934, in forward
return tuple(q_net(qvalue_input) for q_net in self.q_networks)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 934, in <genexpr>
return tuple(q_net(qvalue_input) for q_net in self.q_networks)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/behradx/anaconda3/envs/SB3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype
And I know that tests failed and will solve them (and even write new tests) after I made this work for RL purposes.
@behradkhadem,
Some questions first: Which version of stable baselines is used? Could you add it to the pyproject.toml file as an optional depdency? Why do we need to downgrade the gym version?
In the meantime I have tried to figure out a bit, but the installation of stable_baselines3 is a pain.
@behradkhadem,
Some questions first: Which version of stable baselines is used? Could you add it to the pyproject.toml file as an optional depdency? Why do we need to downgrade the gym version?
In the meantime I have tried to figure out a bit, but the installation of stable_baselines3 is a pain.
I'm using version 1.8.0 of stable baselines 3. Version 2.0.0 and higher use gymnasium
instead of gym
.
I downgraded because newer versions of gym
had breaking changes. For example, the step method returned 5 variables instead of 4 and this was causing problems. I'll jump on it after fixing issues related to SB3.
And installing stable baselines3 was easy as far as I remember. I recommend using a new environment for SB3 (I prefer conda) and installing it with extras: pip install stable-baselines3[extra]
.
It seems like environments are fine and problem lies behind stable baselines 3. These are the things that I checked:
check_env
runs without a hassle.dtype
.gym.spaces Dict
instead of python (vanilla) dict
.Honestly everything seems fine. I could run the RL example using PPO algorithm, but we have the aforementioned error while using DDPG, TD3, SAC. I'll try using RLLib
instead of Stable Baselines3 tomorrow.
Update: After tedious installation of Ray Rllib, I couldn't run it successfully either. I can't pinpoint the source of the issue. I'll continue working on it.
Hello @maxspahn, It's me again! Sorry for the delay, I couldn't work on the package due to some health issues.
I've been working on the package in past week and here's a summary of what I've understood:
env_checker
method has no error when running, but only for version v0.21
and not for v0.26
. And the reason is changes in step
and reset
methods. Even the example environments don't run the way they should, and I don't know why tests didn't fail.float32
for action space causes errors inside package. gym
is deprecated. If you want to make this package future proof (not just for RL) you have to migrate from OpenAI gym to Gymnasium. There are some backward compatibility features inside gymnasium (for all versions of gym) but they are not perfect. At the other hand, migrating from gym to gymnasium is not as easy as it seems, it required changing data types from Dict
and etc to gymnasium.spaces.Space
and so on. There are many breaking changes, but the output at the end is 🤌.Whether you merge this PR or not is your call, but I'll continue working on the package and running it for RL purposes. Right now, my first priority is using this package (as it is) inside SB3 (using gymnasium backward compatibility features). But if you want, I can assist you in migrating to gymnasium and so on.
PS: I think test failed because I reverted the version of gym. I couldn't update poetry.lock
file.
@behradkhadem ,
Great to hear about your progress and all the insides you got!
The environments have no issues and env_checker method has no error when running, but only for version v0.21 and not for v0.26. And the reason is changes in step and reset methods. Even the example environments don't run the way they should, and I don't know why tests didn't fail.
I am not a big fan of downgrading dependencies, so I prefer to not merge this PR, but rather create a specific branch for that on the upstream repository.
The error of stable baselines3 is not because of the package, it originates from stable baselines3 itself (source)! For some reason not using float32 for action space causes errors inside package.
I am suprised, but great that you found that out.
gym is deprecated. If you want to make this package future proof (not just for RL) you have to migrate from OpenAI gym to Gymnasium. There are some backward compatibility features inside gymnasium (for all versions of gym) but they are not perfect. At the other hand, migrating from gym to gymnasium is not as easy as it seems, it required changing data types from Dict and etc to gymnasium.spaces.Space and so on. There are many breaking changes, but the output at the end is.
I knew this day would come rather soon, when I saw that the guys from gym moved away from OpenAI, but here we are. I created a new issue, see #192, for that. I hope I'll find some time during the summer season for that. Maybe it is less work than expected. Also, I am open for help on that one.
I am suprised, but great that you found that out.
It worked! It finally worked! I swear, I had tear in my eyes when that error was gone. The funny part was that I was able to make the envs work in gym v0.21
! I'll work on upgrading the gym to v0.26
.
For now, I think we can maneuver on reward shaping and algorithms. I'd be happy if you test the package and wrapper yourself.
Update: We can't use gym v0.26
in stable baselines3 apparently. We should update to gymnasium. (source)
@maxspahn Could you test it?
Closed in favor of https://github.com/maxspahn/gym_envs_urdf/pull/196.
Hello Max, it's me again!
In reinforcement learning libraries, both info and observation data must be returned inside
reset
method of environment. Literally just changed one line. 😅 (And 9 tests failed 😂)But there are further issues (using environments in RL) that I couldn't solve them yet. I'll submit them inside issues.