hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 723 forks source link

[Bug] Fail loading old DDPG policy with normalisation #363

Closed araffin closed 5 years ago

araffin commented 5 years ago

Describe the bug DDPG models saved with stable-baselines<=v2.5.1 (and normalisation activated) cannot be loaded with the master version. This bug must be fixed before the next version.

Code example

  1. Save a model using v2.5.1 that normalize the observations
    
    from stable_baselines import DDPG

model = DDPG('MlpPolicy', 'Pendulum-v0', normalize_observations=True) model.save("DDPG_pendulum_v2.5.1")


2. Load it with master version
```python
import sys
import pkg_resources

import stable_baselines

# Fix for breaking change for DDPG buffer in v2.6.0
if pkg_resources.get_distribution("stable_baselines").version >= "2.6.0":
    sys.modules['stable_baselines.ddpg.memory'] = stable_baselines.deepq.replay_buffer
    stable_baselines.deepq.replay_buffer.Memory = stable_baselines.deepq.replay_buffer.ReplayBuffer

from stable_baselines import DDPG

DDPG.load("DDPG_pendulum_v2.5.1")

Traceback:

Traceback (most recent call last):
  File "bug_ddpg.py", line 20, in <module>
    DDPG.load("DDPG_pendulum_v2.5.1")
  File "path/stable-baselines/stable_baselines/ddpg/ddpg.py", line 1106, in load
    model.load_parameters(params)
  File "path/stable-baselines/stable_baselines/common/base_class.py", line 422, in load_parameters
    self.sess.run(param_update_ops, feed_dict=feed_dict)
  File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1111, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 64) for Tensor 'Placeholder_48:0', which has shape '(3,)'

Looking at what is happening: there seems to be duplicated keys in self.params of DDPG, those duplicates are removed because we are using dict now.

I'm currently investigating how to fix this bug. Any help is appreciated.

araffin commented 5 years ago

I found the bug, the duplicated keys comes from find_trainable_variables that is not correctly implemented: find_trainable_variables('bla') returns the same as find_trainable_variables() even though the scope bla is not defined.

A fix for that function is easy:

def find_trainable_variables(key):
    """
    Returns the trainable variables within a given scope

    :param key: (str) The variable scope
    :return: ([TensorFlow Tensor]) the trainable variables
    """
    return tf.trainable_variables(scope=key)

or simply use tf_util.get_trainable_vars(scope) which is already in the code...

however, it may breaks previous saved models.

The good news, is that it affects only DDPG with normalization. For the others algorithms, it was just saving duplicated params.

araffin commented 5 years ago

Note: this was silently fixed in OpenAI repo in https://github.com/openai/baselines/commit/8c2aea2addc9f3ba36d4a0c937e6a2d09830afc7