takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.29k stars 230 forks source link

[BUG] assertion errors on highway-env with DQN. #70

Closed StarBaseOne closed 3 years ago

StarBaseOne commented 3 years ago

Hello Takuma

I am working with highway-env (custom environment) and have tried to test your implementation of the DQN as interested in using the Discrete CQL and CQL implementations alongside SB3. THe problem I am having is with the observation shapes of the environment (I've tried flattening the observations) and would like to know if you have any ideas to sort this out, perhaps you have seen this before? The observation shape is a 2D array. I tried tinkering with a custom policy, flattening the observations and using the VectorEncoder but to no avail.

return spaces.Box(shape=(self.vehicles_count, len(self.features)), low=-1, high=1, dtype=np.float32)

Snippet of the code

env = gym.make('highway-v0')
eval_env = gym.make('highway-v0')

# setup algorithm
dqn = DQN(batch_size=32,n_frames = 1,
          learning_rate=2.5e-4,
          target_update_interval=100,
          use_gpu=True)

# setup replay buffer
buffer = ReplayBuffer(maxlen=1000000, env=env)

# setup explorers
explorer = LinearDecayEpsilonGreedy(start_epsilon=1.0,
                                    end_epsilon=0.1,
                                    duration=10000)

# start training
dqn.fit_online(env,
               buffer,
               explorer=explorer, # you don't need this with probablistic policy algorithms
               eval_env=eval_env)

The error I receive with an observation array of shape (5,5) is

python test.py 
2021-05-04 20:11.41 [info     ] Directory is created at d3rlpy_logs/DQN_online_20210504201141
2021-05-04 20:11.41 [debug    ] Building model...
Traceback (most recent call last):
  File "test.py", line 60, in <module>
    dqn.fit_online(env,
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/algos/base.py", line 236, in fit_online
    train_single_env(
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/online/iterators.py", line 159, in train_single_env
    _setup_algo(algo, env)
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/online/iterators.py", line 89, in _setup_algo
    algo.build_with_env(env)
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/base.py", line 685, in build_with_env
    self.create_impl(
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/base.py", line 657, in create_impl
    self._create_impl(observation_shape, action_size)
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/algos/dqn.py", line 135, in _create_impl
    self._impl.build()
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/algos/torch/dqn_impl.py", line 73, in build
    self._build_network()
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/algos/torch/dqn_impl.py", line 87, in _build_network
    self._q_func = create_discrete_q_function(
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/models/builders.py", line 42, in create_discrete_q_function
    encoder = encoder_factory.create(observation_shape)
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/models/encoders.py", line 294, in create
    return factory.create(observation_shape)
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/models/encoders.py", line 210, in create
    assert len(observation_shape) == 1
AssertionError

If I flatten it to 1D it still fails. Why is it expecting observation_shape to be 1? When I change the observation to image based (using Nature CNN, 4 stacked frames of 128,64 Net) I also receive a different error. (I chose n_frames of 4 for stacking)

The images are uint8 and follow the channel first layout of # C x W x H, As you can see in the params.json output the observation_shape is (1, 128, 64) with 4 stacked frames (4, 128, 64).


python test.py 
2021-05-04 20:17.53 [info     ] Directory is created at d3rlpy_logs/DQN_online_20210504201753
2021-05-04 20:17.53 [debug    ] Building model...
2021-05-04 20:17.55 [debug    ] Model has been built.
2021-05-04 20:17.55 [info     ] Parameters are saved to d3rlpy_logs/DQN_online_20210504201753/params.json params={'action_scaler': None, 'augmentation': {'params': {'n_mean': 1}, 'augmentations': []},
 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params':
 {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}},
 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 0.00025, 
'n_critics': 1, 'n_frames': 4, 'n_steps': 1, 'optim_factory': 
{'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 
'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': 
{'type': 'mean', 'params': {'bootstrap': False, 'share_encoder': False}},
 'real_ratio': 1.0, 'scaler': None,
 'target_reduction_type': 'min', 'target_update_interval': 100,
 'use_gpu': 0, 'algorithm': 'DQN', 'observation_shape': (4, 128, 64), 'action_size': 5}

  0%|                                                         | 0/1000000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "test.py", line 61, in <module>
    dqn.fit_online(env,
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/algos/base.py", line 236, in fit_online
    train_single_env(
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/online/iterators.py", line 188, in train_single_env
    stacked_frame.append(observation)
  File "/home/brian/anaconda3/lib/python3.8/site-packages/d3rlpy/preprocessing/stack.py", line 47, in append
    assert image.dtype == self._dtype

Not sure why it's throwing the assertion for the numpy array, I confirmed indeed that the input is an <class 'numpy.ndarray'>

 def observe(self) -> np.ndarray:
        new_obs = self._render_to_grayscale()
        self.obs = np.roll(self.obs, -1, axis=0)
        self.obs[-1, :, :] = new_obs
        return self.obs`

Confirmed using latest release of d3rlpy 0.80

takuseno commented 3 years ago

@hougiebear Thanks for reporting this issue! I believe this is because of the 2D shape observation. I guess this will solve your problem.

from gym.spaces import Box

class FlattenWrapperEnv(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)
        # this is important
        shape = self.observation_space.shape
        self.observation_space = Box(shape=(shape[0] * shape[1],), low=-1, high=1, dtype=np.float32)

    def step(self, action):
        obs, reward, done, info = super.step(action)
        flat_obs = np.reshape(obs, [-1])
        return flat_obs, reward, done, info

    def reset(self):
        obs = self.reset()
        return np.reshape(obs, [-1])
StarBaseOne commented 3 years ago

@hougiebear Thanks for reporting this issue! I believe this is because of the 2D shape observation. I guess this will solve your problem.

from gym.spaces import Box

class FlattenWrapperEnv(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)
        # this is important
        shape = self.observation_space.shape
        self.observation_space = Box(shape=(shape[0] * shape[1],), low=-1, high=1, dtype=np.float32)

    def step(self, action):
        obs, reward, done, info = super.step(action)
        flat_obs = np.reshape(obs, [-1])
        return flat_obs, reward, done, info

    def reset(self):
        obs = self.reset()
        return np.reshape(obs, [-1])

Thank you very much for responding Takuma. That worked. (had some issues with max recursion limit but ironed that out).

Also looked at the OpenAIgym wrapper repo and this works also


from gym import ObservationWrapper

class FlattenObservation(ObservationWrapper):
    r"""Observation wrapper that flattens the observation."""
    def __init__(self, env):
        super(FlattenObservation, self).__init__(env)
        self.observation_space = spaces.flatten_space(env.observation_space)

    def observation(self, observation):
        return spaces.flatten(self.env.observation_space, observation)
ajam74001 commented 1 year ago

Hello dear all, many thanks for your great comments. I found them very clear and useful. I am using the same environment for an offline RL task and facing almost the same issue. For my task, I need to collect some data from the environment via some policy. In order to prevent the observation shape issue, I used the gym wrapper to flatten the observation space and then collected the data from the environment with a random policy employing the code provided in the d3rlpy documentation. However, I am receiving an error while starting the data collection process. I will appreciate any help.

The wrapper used to flatten the observation space:

import gym
import gym.spaces as spaces

class FlattenObservation(gym.ObservationWrapper):
    def __init__(self, env: gym.Env):
        super().__init__(env)
        self.observation_space = spaces.flatten_space(env.observation_space)

    def observation(self, observation):
        return spaces.flatten(self.env.observation_space, observation)

The code used to collect data using random policy (from documentation):

import d3rlpy
# setup algorithm
random_policy = d3rlpy.algos.DiscreteRandomPolicy()

# prepare experience replay buffer
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=100000, env=env) # env is the flatten version from now on 

# start data collection
random_policy.collect(env, buffer, n_steps=100000)

# export as MDPDataset
dataset = buffer.to_mdp_dataset()

The received error:

image