LSTM policies are broken for PPO1 and TRPO

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.16k stars 725 forks source link

LSTM policies are broken for PPO1 and TRPO #140

Open ernestum opened 5 years ago

ernestum commented 5 years ago

See the feature/fix_lstm branch for a test which fails for the above mentioned algorithms.

For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (None is passed). Then the ortho-initializer has issues.

For PPO2 the assert in line 109 fails:

assert self.n_envs % self.nminibatches == 0, "For recurrent policies, "\
                        "the number of environments run in parallel should be a multiple of nminibatches."

Since the PPO2 instance is created using PPO2(policy, 'CartPole-v1'), the default parameters of PPO2 seem to be broken somehow?

For ACKTR the issue is somewhere in get_factors of kfac.py and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.

araffin commented 5 years ago

Related to #60 #70

ernestum commented 5 years ago

Any hints for what exactly the masks are used for? This would help a lot!

araffin commented 5 years ago

I did not look closely at the code, but I remember seeing masks and dones interchangeably in the code (see here fir ppo2). Then it used in the policies and especially by lstm to reset states for new episodes.

ernestum commented 5 years ago

PPO2 works if we fix the issue with the number of minibatches.

hill-a commented 5 years ago

ACKTR is fixed on the enhancements branch, however I lost quite some time this month, and didn't have time to finish the branch (wanted to fix PPO1 and TRPO first). might try again over the next few weeks

RGring commented 5 years ago

I am also interested in the combination of PPO1 + CnnLstmPolicy :)

HareshKarnan commented 5 years ago

Any updates on this ? TRPO still doesn't support MlpLstmPolicy :'(

araffin commented 5 years ago

@HareshMiriyala for now, we don't have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon.

However, we appreciate contributions, especially to fix that kind of thing ;)

hejujie commented 4 years ago

I also mentioned this problem last year in issue #60. It seem the problem is not fixed up to now, is there any plan ? I would like to fix the problem in this month, is there any comment or advice? @araffin @hill-a @erniejunior

araffin commented 4 years ago

It seem the problem is not fixed up to now, is there any plan ?

Hello, There is not plan for now, we are now focusing on the migration to tf2 (cf #366 ).

I would like to fix the problem in this month, is there any comment or advice?

We would appreciate a PR, however, I think only @erniejunior is a bit familiar with the LSTM policy (I don't know much of that part of the code, which comes from OpenAI code and which is one of the most complex and undocumented part of the lib).