Open ernestum opened 5 years ago
Related to #60 #70
Any hints for what exactly the masks are used for? This would help a lot!
PPO2 works if we fix the issue with the number of minibatches.
ACKTR is fixed on the enhancements branch, however I lost quite some time this month, and didn't have time to finish the branch (wanted to fix PPO1 and TRPO first). might try again over the next few weeks
I am also interested in the combination of PPO1 + CnnLstmPolicy :)
Any updates on this ? TRPO still doesn't support MlpLstmPolicy :'(
@HareshMiriyala for now, we don't have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon.
However, we appreciate contributions, especially to fix that kind of thing ;)
I also mentioned this problem last year in issue #60. It seem the problem is not fixed up to now, is there any plan ? I would like to fix the problem in this month, is there any comment or advice? @araffin @hill-a @erniejunior
It seem the problem is not fixed up to now, is there any plan ?
Hello, There is not plan for now, we are now focusing on the migration to tf2 (cf #366 ).
I would like to fix the problem in this month, is there any comment or advice?
We would appreciate a PR, however, I think only @erniejunior is a bit familiar with the LSTM policy (I don't know much of that part of the code, which comes from OpenAI code and which is one of the most complex and undocumented part of the lib).
See the
feature/fix_lstm
branch for a test which fails for the above mentioned algorithms.For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (
None
is passed). Then the ortho-initializer has issues.For PPO2 the assert in line 109 fails:
Since the PPO2 instance is created using
PPO2(policy, 'CartPole-v1')
, the default parameters of PPO2 seem to be broken somehow?For ACKTR the issue is somewhere in
get_factors
ofkfac.py
and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.