hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Model training and testing the same dataset does not perform the same #1101

Closed yuvaleck closed 3 years ago

yuvaleck commented 3 years ago

Hi I am using PPO2 model with a custom env When the traing the model it can acheive a positive average value It can be seen both in the tensorboard episode_reward chart and also by monitoring the env (from stable_baselines.bench import Monitor) However, when passing the same dataset through the model the results are much worse... (average around zero) Any possible explanation? R, yuvaleck

Capture

Miffyli commented 3 years ago

Hey. Could you provide full code to replicate the issue? This sounds like custom gym environment issue, for which we do not offer such tech support unless there is a bug / proposal for enhancement in the library itself.

MariamDundua commented 3 years ago

Is it possible to bound somehow observation in a stable-baseline? For example between in 0-1

araffin commented 3 years ago

Is it possible to bound somehow observation in a stable-baseline? For example between in 0-1

https://github.com/hill-a/stable-baselines/issues/1104

But your remark seems unrelated to the original issue...

You may also take a look at VecNormalize (cf doc).