keiohta / tf2rl

TensorFlow2 Reinforcement Learning
MIT License
461 stars 104 forks source link

Fix: load trajectories with limited max_steps #146

Closed ymd-h closed 2 years ago

ymd-h commented 2 years ago

This PR for #144 and #145.

keiohta commented 2 years ago

@ymd-h Thanks for your PR! I think it would be better to use next_obs key instead of taking next_obs from obs key with one step shift. This change should be fine (in the sense that we won't encounter key error) because we can assume the loaded paths are generated by passing the command line argument --save-path, and generated in evaluate_policy method.

I would merge the PR once you can apply the above change, or let me know if you have other idea or opinion..

ymd-h commented 2 years ago

@keiohta Thank you for your review.

I added commits which utilizing next_obs key as you mentioned.

keiohta commented 2 years ago

Thanks @ymd-h ! I believe the codes are improved and resolved #144 and #145.