kakaoenterprise / JORLDY

Repository for Open Source Reinforcement Learning Framework JORLDY
Apache License 2.0
359 stars 50 forks source link

Invalid probability value in tensor when running mpo #174

Open kan-s0 opened 2 years ago

kan-s0 commented 2 years ago

Describe the bug RuntimeError when running mpo

To Reproduce

python main.py --config.mpo.atari --env.name breakout --sync

When config is modified with the values shown in the paper, it occurs faster and more frequently.

Expected behavior

Screenshots

training graph

스크린샷 2022-04-18 오후 2 36 23

error txt

스크린샷 2022-04-18 오후 2 23 06

mpo generated agent code

스크린샷 2022-04-18 오후 2 28 12

Development Env. (OS, version, libraries):

Additional context