Using envs inside stable baselines 3 for RL tasks

behradkhadem commented 1 year ago

Hello everyone!

I'm trying to use the environments of this package for RL robotics tasks (using stable baselines3 package in python). I define my env as mentioned in docs:

robots = [
GenericUrdfReacher(urdf="pointRobot.urdf", mode="vel"),
]
env = gym.make(
"urdf-env-v0",
dt=0.01, robots=robots, render=True
)

But environment's observation and action spaces are defined as a dictionary, so in order to access them I should do something like this:

env.reset()
env.action_space['robot_0']

So, when I define my RL model like this:

model = TD3("MlpPolicy", env, verbose=1, device='cuda')
model.learn(total_timesteps=100000)

I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_7541/1791512853.py in 
      1 # Define the TD3 agent and train it on the environment
----> 2 model = TD3("MlpPolicy", env, verbose=1, device='cuda')
      3 model.learn(total_timesteps=100000)

[~/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/td3/td3.py](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/behradx/projects/RL/SB3/Robot/~/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/td3/td3.py) in __init__(self, policy, env, learning_rate, buffer_size, learning_starts, batch_size, tau, gamma, train_freq, gradient_steps, action_noise, replay_buffer_class, replay_buffer_kwargs, optimize_memory_usage, policy_delay, target_policy_noise, target_noise_clip, tensorboard_log, policy_kwargs, verbose, seed, device, _init_setup_model)
     96     ):
     97 
---> 98         super().__init__(
     99             policy,
    100             env,

[~/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/behradx/projects/RL/SB3/Robot/~/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py) in __init__(self, policy, env, learning_rate, buffer_size, learning_starts, batch_size, tau, gamma, train_freq, gradient_steps, action_noise, replay_buffer_class, replay_buffer_kwargs, optimize_memory_usage, policy_kwargs, tensorboard_log, verbose, device, support_multi_env, monitor_wrapper, seed, use_sde, sde_sample_freq, use_sde_at_warmup, sde_support, supported_action_spaces)
    104     ):
    105 
--> 106         super().__init__(
    107             policy=policy,
    108             env=env,

[~/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/base_class.py](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/behradx/projects/RL/SB3/Robot/~/anaconda3/envs/SB3/lib/python3.9/site-packages/stable_baselines3/common/base_class.py) in __init__(self, policy, env, learning_rate, policy_kwargs, tensorboard_log, verbose, device, support_multi_env, monitor_wrapper, seed, use_sde, sde_sample_freq, supported_action_spaces)
    158         if env is not None:
    159             env = maybe_make_env(env, self.verbose)
--> 160             env = self._wrap_env(env, self.verbose, monitor_wrapper)
...
---> 74         shapes[key] = box.shape
     75         dtypes[key] = box.dtype
     76     return keys, shapes, dtypes

AttributeError: 'dict' object has no attribute 'shape'

How can I circumvent the dictionary definition of observation and action spaces?

I run my code on Windows 11 with WSL2 (Ubuntu) on Anaconda.

maxspahn commented 1 year ago

Hi @behradkhadem ,

good to see that you are interested in this project!

First comment for RL usage with these environments. We currently have no default reward function implemented, so you would need to implement this yourself, see https://github.com/maxspahn/gym_envs_urdf/blob/be7532ae35675c5a2fd8c0d1782e8dbfd684e446/urdfenvs/urdf_common/urdf_env.py#L278. Maybe, some of the users with RL can give you some hint on that @alxschwrz .

How can I circumvent the dictionary definition of observation and action spaces?

In the gym.make-call, you can specify to flatten the observation, using the flatten_observation-argument. For your case it would look like:

robots = [
GenericUrdfReacher(urdf="pointRobot.urdf", mode="vel"),
]
env = gym.make(
"urdf-env-v0",
dt=0.01, robots=robots, render=True, flatten_observation=True,
)

Then, the observation is flattened into an array.

Let me know if you need anything else and good luck.

behradkhadem commented 1 year ago

Thanks for your kind response @maxspahn I did a quick test and didn't dive deep to the problem, but it seems that it didn't work the way it should, and I got the same error. I even tried to use FlattenObservation wrapper but didn't work either (more info here).

And on the problem of static reward, it there a way to pass our reward function or we have no other way other than overriding existing code?

maxspahn commented 1 year ago

I did a quick test and didn't dive deep to the problem, but it seems that it didn't work the way it should, and I got the same error.

Could you provide me with the script you are trying to run? Then, I could have a look.

And on the problem of static reward, it there a way to pass our reward function or we have no other way other than overriding existing code?

Currently this is not possible, so you would have to write your own environment. I recommend simply deriving from UrdfEnvs and only overloading the stepper function.

behradkhadem commented 1 year ago

I did a quick test and didn't dive deep to the problem, but it seems that it didn't work the way it should, and I got the same error.

Could you provide me with the script you are trying to run? Then, I could have a look.

And on the problem of static reward, it there a way to pass our reward function or we have no other way other than overriding existing code?

Currently this is not possible, so you would have to write your own environment. I recommend simply deriving from UrdfEnvs and only overloading the stepper function.

Sure, it was in a notebook environment, but it was basically this:

import warnings
import gym
import numpy as np
from urdfenvs.urdf_common.urdf_env import UrdfEnv
from urdfenvs.robots.generic_urdf import GenericUrdfReacher

from stable_baselines3 import TD3
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

robots = [
GenericUrdfReacher(urdf="pointRobot.urdf", mode="vel"),
]
env = gym.make(
"urdf-env-v0",
dt=0.01, robots=robots, render=True, flatten_observation=True
)
env.reset()
# keys = ['observation', 'desired_goal']
# env = FlattenObservation(FilterObservation(env, keys))

# Define the TD3 agent and train it on the environment
model = TD3("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=100000)

I was just trying to run the code successfully (meaning without error) and wasn't expecting a result from RL algorithm. Thanks for your time, I really appreciate it.

maxspahn commented 1 year ago

Ok, so I found the bug. Actually only the observation is flattened with this approach. Not the observation_space (and neither the action_space).

I created a new issue for that, see #171.

Feel free to create a PR for that. I might have time end of next week myself.

alxschwrz commented 1 year ago

Hi everybody, sorry for my late reply. I currently don't have access to a computer, but I am happy to give you more information about how I use gym_envs_urdf for RL by the end of this week @behradkhadem. Do you have any specific questions at the moment? For the reward function, I overwrote the step functions as Max mentioned.

behradkhadem commented 1 year ago

Hi @alxschwrz, The issue above is about me being unable to use Gym envs of this package for training a RL agent (using stable baselines 3). I get an error regarding data type of observation and action spaces and can't run a simple code (like the one above). How did you tackle this issue? Can you put some sample code of how you've used it?

maxspahn commented 1 year ago

So, I have checked whether the flatten_observation still work. And thanks to your (@behradkhadem) hint to the FlattenObservation wrapper, I realized that the flatten_observation is redundant with the wrapper.

If you don't use the argument flatten_observation and use the FlattenObservation wrapper it works simply using

env = FlattenObservation(env)

Note that you have to do that after the reset

Let me know if that helps you. I'll work on the integrating the FullSensor so that the observation also contains information on the goal and obstacles.

behradkhadem commented 1 year ago

So, I have checked whether the flatten_observation still work. And thanks to your (@behradkhadem) hint to the FlattenObservation wrapper, I realized that the flatten_observation is redundant with the wrapper.

If you don't use the argument flatten_observation and use the FlattenObservation wrapper it works simply using
env = FlattenObservation(env)
Note that you have to do that after the reset

Let me know if that helps you. I'll work on the integrating the FullSensor so that the observation also contains information on the goal and obstacles.

Thanks dear @maxspahn but this didn't work for me. I used FlattenObservation wrappper after reset method but I got this error. I thought this was due to package version so I tried pip install --upgrade urdfenvs but nothing changed.

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_428/2987627242.py in 
      7 )
      8 env.reset()
----> 9 env = FlattenObservation(env=env)
     10 # keys = ['observation', 'desired_goal']
     11 # env = FlattenObservation(FilterObservation(env, keys))

[~/anaconda3/envs/SB3/lib/python3.9/site-packages/gym/wrappers/flatten_observation.py](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/behradx/projects/RL/SB3/Robot/~/anaconda3/envs/SB3/lib/python3.9/site-packages/gym/wrappers/flatten_observation.py) in __init__(self, env)
      8     def __init__(self, env):
      9         super(FlattenObservation, self).__init__(env)
---> 10         self.observation_space = spaces.flatten_space(env.observation_space)
     11 
     12     def observation(self, observation):

[~/anaconda3/envs/SB3/lib/python3.9/functools.py](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/behradx/projects/RL/SB3/Robot/~/anaconda3/envs/SB3/lib/python3.9/functools.py) in wrapper(*args, **kw)
    886                             '1 positional argument')
    887 
--> 888         return dispatch(args[0].__class__)(*args, **kw)
    889 
    890     funcname = getattr(func, '__name__', 'singledispatch function')

[~/anaconda3/envs/SB3/lib/python3.9/site-packages/gym/spaces/utils.py](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/behradx/projects/RL/SB3/Robot/~/anaconda3/envs/SB3/lib/python3.9/site-packages/gym/spaces/utils.py) in flatten_space(space)
    190         True
...
--> 192     raise NotImplementedError(f"Unknown space: `{space}`")
    193 
    194 

NotImplementedError: Unknown space: `{'robot_0': Dict(joint_state:Dict(position:Box([-5. -5. -5.], [5. 5. 5.], (3,), float64), velocity:Box([-2.175 -2.175 -2.175], [2.175 2.175 2.175], (3,), float64)))}`

And here is the code I ran:

import warnings
import gym
import numpy as np
from urdfenvs.urdf_common.urdf_env import UrdfEnv
from urdfenvs.robots.generic_urdf import GenericUrdfReacher

robots = [
GenericUrdfReacher(urdf="pointRobot.urdf", mode="vel"),
]
env = gym.make(
"urdf-env-v0",
dt=0.01, robots=robots, render=True, flatten_observation=True # I tried both true and false.
)
env.reset()
env = FlattenObservation(env)

maxspahn commented 1 year ago

@behradkhadem I have created a PR to improve the situation for you.

Let me know if that helps your case by checking out the corresponding branch of the PR. I'll wait a bit for your response on that.

Which versions of urdfenvs are you using by the way?

behradkhadem commented 1 year ago

@behradkhadem I have created a PR to improve the situation for you.

Let me know if that helps your case by checking out the corresponding branch of the PR. I'll wait a bit for your response on that.

Which versions of urdfenvs are you using by the way?

Thanks a lot! I've never tested a python package from a branch to be honest and I'm searching on the subject and will respond if my tests were successful. And I'm using version urdfenvs==0.6.0.

maxspahn commented 1 year ago

Thanks a lot! I've never tested a python package from a branch to be honest and I'm searching on the subject

You could install from a specific branch using: `

pip install git+ssh://git@github.com/maxspahn/gym_envs_urdf.git@fix-flatten-observation

Or, you clone the repository and install it using pip install ..

behradkhadem commented 1 year ago

Since I got no response from @alxschwrz, I'm closing this issue.

maxspahn / gym_envs_urdf

Using envs inside stable baselines 3 for RL tasks #170