hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

TD3 & DDPG: RuntimeError: "normal_kernel_cuda" not implemented for 'Char' #1137

Closed olyanos closed 3 years ago

olyanos commented 3 years ago

Describe the bug I am trying to use TD3 or DDPG on my custom environment using a CnnPoilcy. However, I keep getting the following error: RuntimeError: "normal_kernel_cuda" not implemented for 'Char'. I ran PPO and A2C and I don't have any problems and the code runs. The problem happens both when I run the code on CPU and GPU. The error when I run on CPU: RuntimeError: "normal_kernel_cpu" not implemented for 'Char'

Code example n_actions = env.action_space.shape[-1] action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))

model = TD3('CnnPolicy', env, verbose=1,action_noise=action_noise) model.learn(200000)

Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "vscode/extensions/ms-python.python-2021.9.1191016588/pythonFiles/lib/python/debugpy/main.py", line 45, in cli.main() File ".vscode/extensions/ms-python.python-2021.9.1191016588/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File ".vscode/extensions/ms-python.python-2021.9.1191016588/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("main")) File "/usr/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/usr/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "temp.py", line 557, in model.learn(RL_total_time_steps) File "python3.8/site-packages/stable_baselines3/td3/td3.py", line 208, in learn return super(TD3, self).learn( File "/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 371, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "python3.8/site-packages/stable_baselines3/td3/td3.py", line 155, in train noise = replaydata.actions.clone().data.normal(0, self.target_policy_noise) RuntimeError: "normal_kernel_cuda" not implemented for 'Char' Please use the markdown code blocks for both code and stack traces.

System Info Libs versions: Tensorflow version = 2.5.0 Keras version = 2.5.0 Stable baseline version = 1.2.0 OpenAI Gym version = 0.19.0 Torch version = 1.9.0 Python version = 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0]

Running on NVIDIA GeForce RTX 3090

My environment deals with very big action space: self.action_space = spaces.Box(low=-1,high=1,shape=(3200),dtype=np.int8) self.observation_space = spaces.Box(low=0, high=255,shape=(40,40,N_Channels), dtype=np.uint8)

I use the check_env wrapper to check my environment and everything checks out.

Best, Omar

Miffyli commented 3 years ago

self.action_space = spaces.Box(low=-1,high=1,shape=(3200),dtype=np.int8)

This probably is causing the error. Try using a Discrete space instead for the action space.

Also I recommend you change to stable-baselines3 and test your environment with the env checker. See here: https://stable-baselines3.readthedocs.io/en/master/guide/custom_env.html

olyanos commented 3 years ago

Thank you for the quick reply. I did check it with the env checker and it checks out okay without any warnings or errors.

I switched my action space to Multidiscrete: self.action_space = spaces.MultiDiscrete([3 for _ in range(dimension*X*Y)])

I used a multidiscrete instead of discrete as I need an( M by N ) actions ( each action from the action space acts on a specific pixel in my image)

I ran into the following error: AssertionError: The algorithm only supports <class 'gym.spaces.box.Box'> as action spaces but MultiDiscrete([3 3 3 ... 3 3 3]) was provided

Miffyli commented 3 years ago

Ah right, sorry, I forgot to mention that DDPG and TD3 do not support discrete action spaces. You would have to change to DQN (Discrete space) or to A2C/PPO for use these.

olyanos commented 3 years ago

thanks, You are right, it turns out the problem is caused by the Box action space, specifically with the data type np.int8 i.e When I use: self.action_space = spaces.Box(low=-1,high=1,shape=(3200),dtype=np.int8) I get the error, but when I use self.action_space = spaces.Box(low=-1,high=1,shape=(3200,),dtype=np.float32) It works okay. Although using the continuous action space doesn't help me a lot since I am asking my agent to act on each [R,G,B] pixel of the image, but this solves the runtime error. I will close the issue and work more on it and update the thread if I find anything.