hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 723 forks source link

[Bug] Unexpected Argument when using A2C predict function #209

Closed Kuldr closed 5 years ago

Kuldr commented 5 years ago

Describe the bug

When using a train model in to predict it throws an error with the following stack trace
  File "/.../stable_baselines/common/base_class.py", line 367, in predict
    actions, _, states, _ = self.step(observation, state, mask, deterministic=deterministic)
TypeError: step() got an unexpected keyword argument 'deterministic'

Code example This happens when calling model.predict (only tried with an A2C model)

Temporary Fix I have managed to fix this by changing line 367 in common/baseclass.py from `actions, , states, = self.step(observation, state, mask, deterministic=deterministic) toactions, , states, _ = self.step(observation, state, mask)`

hill-a commented 5 years ago

Hey,

This seems odd, are you using a custom Policy class?

As the step function in the base policy for A2C (ActorCriticPolicy(BasePolicy)) does have the deterministic keyword argument

EDIT: just noticed the documentation on custom Policies was missing the deterministic keyword argument and code, documentation is updated now

araffin commented 5 years ago

Hello, as @hill-a pointed out, it seems that some information are missing to help you, please fill the template form completely (especially minimal code to reproduce).

Trying to reproduce your problem, the following code runs:

from stable_baselines import A2C

model = A2C("MlpPolicy", "Pendulum-v0")
env = model.get_env()
obs = env.reset()
model.predict(obs)
model.predict(obs, deterministic=True)

In fact, this feature is tested several times by the CI (for instance here), so it seems you are doing something custom, right?

Kuldr commented 5 years ago

I am using a custom policy class, which is missing the deterministic Keyword argument as @hill-a mentioned.

By following the updated docs adding the argument to the step function in my custom policy has solved the issue.

Thanks for your help on this