[Bug] Fail loading old DDPG policy with normalisation

araffin commented 5 years ago

Describe the bug DDPG models saved with stable-baselines<=v2.5.1 (and normalisation activated) cannot be loaded with the master version. This bug must be fixed before the next version.

Code example

Save a model using v2.5.1 that normalize the observations
```
from stable_baselines import DDPG
```

model = DDPG('MlpPolicy', 'Pendulum-v0', normalize_observations=True) model.save("DDPG_pendulum_v2.5.1")


2. Load it with master version
```python
import sys
import pkg_resources

import stable_baselines

# Fix for breaking change for DDPG buffer in v2.6.0
if pkg_resources.get_distribution("stable_baselines").version >= "2.6.0":
    sys.modules['stable_baselines.ddpg.memory'] = stable_baselines.deepq.replay_buffer
    stable_baselines.deepq.replay_buffer.Memory = stable_baselines.deepq.replay_buffer.ReplayBuffer

from stable_baselines import DDPG

DDPG.load("DDPG_pendulum_v2.5.1")

Traceback:

Traceback (most recent call last):
  File "bug_ddpg.py", line 20, in <module>
    DDPG.load("DDPG_pendulum_v2.5.1")
  File "path/stable-baselines/stable_baselines/ddpg/ddpg.py", line 1106, in load
    model.load_parameters(params)
  File "path/stable-baselines/stable_baselines/common/base_class.py", line 422, in load_parameters
    self.sess.run(param_update_ops, feed_dict=feed_dict)
  File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/user/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1111, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 64) for Tensor 'Placeholder_48:0', which has shape '(3,)'

Looking at what is happening: there seems to be duplicated keys in self.params of DDPG, those duplicates are removed because we are using dict now.

I'm currently investigating how to fix this bug. Any help is appreciated.

araffin commented 5 years ago

I found the bug, the duplicated keys comes from find_trainable_variables that is not correctly implemented: find_trainable_variables('bla') returns the same as find_trainable_variables() even though the scope bla is not defined.

A fix for that function is easy:

def find_trainable_variables(key):
    """
    Returns the trainable variables within a given scope

    :param key: (str) The variable scope
    :return: ([TensorFlow Tensor]) the trainable variables
    """
    return tf.trainable_variables(scope=key)

or simply use tf_util.get_trainable_vars(scope) which is already in the code...

however, it may breaks previous saved models.

The good news, is that it affects only DDPG with normalization. For the others algorithms, it was just saving duplicated params.

araffin commented 5 years ago

Note: this was silently fixed in OpenAI repo in https://github.com/openai/baselines/commit/8c2aea2addc9f3ba36d4a0c937e6a2d09830afc7

hill-a / stable-baselines

[Bug] Fail loading old DDPG policy with normalisation #363