hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.12k stars 724 forks source link

shape error when learning DQN agent #88

Closed tdr1991 closed 5 years ago

tdr1991 commented 5 years ago

Hi everyone,

I am trying to run the DQN agent on a custom environment, I get this error:

File "/Users/tdr/miniconda3/envs/py3/lib/python3.6/site-packages/stable_baselines/deepq/dqn.py", line 227, in learn

tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor must be 4-D with last dim 1, 3, or 4, not [32,5,14]

but learning A2C agent and PPO2 agent, it will work.

araffin commented 5 years ago

Hello, Please use the issue template. Otherwise, it will be hard for me to help you.

EDIT: It seems that the problem may come from your custom env and the way you defined action_space and observation_space

tdr1991 commented 5 years ago

@araffin Hi I am trying to run the DQN agent on a custom environment, I get this error:

File "/Users/tdr/miniconda3/envs/py3/lib/python3.6/site-packages/stable_baselines/deepq/dqn.py", line 227, in learn

tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor must be 4-D with last dim 1, 3, or 4, not [32,5,14]

Some training codes: from stable_baselines.common.vec_env import DummyVecEnv from stable_baselines.deepq.policies import MlpPolicy env = DummyVecEnv([lambda: ScavengerDayEnv(datasource="/reinforcement_learning/day_train/")])

alg = DQN model = alg(MlpPolicy, env, verbose=1)

Some custom environment definition: self.action_space = spaces.Discrete( 21 ) self.observation_space= spaces.Box(self.src.min_values, self.src.max_values, np.shape(self.src.time_serie[0]))

araffin commented 5 years ago

what is the shape of self.src.time_serie[0] ?

Which stable-baselines version are you using? TF version ?

Can you provide minimal code to reproduce the error ?

tdr1991 commented 5 years ago

@araffin

the shape of self.src.time_serie[0] is [5, 14]. stable-baselines version : 2.2.1. TF version : 1.10.0.

araffin commented 5 years ago

Is there something that prevent you from flattening the ndarray to a 1D array? If you do that, it will work. (so the shape will be (70,) instead of (5, 14))

tdr1991 commented 5 years ago

but when I use openAI baselines, it will work.

tdr1991 commented 5 years ago

I have located error:

File "/Users/tdr/miniconda3/envs/py3/lib/python3.6/site-packages/stable_baselines/deepq/build_graph.py", line 436, in build_train tf.summary.image('observation', obs_phs[0])

the shape of obs_phs[0] is [32,5,14], but image function need 4-D with last dim 1, 3, or 4.

when I commet this code, it will work.

araffin commented 5 years ago

I see, but in your code, you have passed a tensorboard log dir, right? I will add a check to avoid that error.

tdr1991 commented 5 years ago

I don't know whether I have passed a tensorboard log dir, I can save and load model.

araffin commented 5 years ago

@tdr1991 I just pushed the "patch-dqn" branch, can you confirm it solves your problem?

tdr1991 commented 5 years ago

@araffin It will work, thank you.

tdr1991 commented 5 years ago

When I learning ACER agent, there is similar shape error:

File "/Users/tdr/miniconda3/envs/py3/lib/python3.6/site-packages/stable_baselines/acer/acer_simple.py", line 569, in init obs_height, obs_width, obs_num_channels = env.observation_space.shape ValueError: not enough values to unpack (expected 3, got 2)

env.observation_space.shape=(5,14), so I set observation_space.shape=(5,14,1)

obs_shape = np.shape(self.src.time_serie[0]) self.observation_space= spaces.Box(self.src.min_values, self.src.max_values, (obs_shape[0], obs_shape[1],1)) print(self.observation_space.shape) #(5,14,1)

but it is still error.

araffin commented 5 years ago

I think this comes from the ACER buffer, which is a different issue... If you want to use ACER, you need to flatten your observation space to a 1D array. (Otherwise, that would mean refactoring the all ACER buffer, a thing that I don't have time for).

tdr1991 commented 5 years ago

If I flatten my observation space to a 1D array, will it affect other agents? PS: Until now, these agents(A2C, ACKTR, PPO1, PPO2 , TRPO, DQN) can work.

araffin commented 5 years ago

It should work because it was made with feature vectors in mind.

tdr1991 commented 5 years ago

Ok, thank you, I'll try tomorrow.

nachovoss commented 5 years ago

hey guys i'm getting this error running ppo2 any help would be appreciated

n_images, height, width, n_channels = img_nhwc.shape ValueError: not enough values to unpack (expected 4, got 1)