kakaoenterprise / JORLDY

Repository for Open Source Reinforcement Learning Framework JORLDY
Apache License 2.0
359 stars 50 forks source link

Ray Out Of Memory Error #208

Closed kan-s0 closed 2 years ago

kan-s0 commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce python main.py --async --config config.r2d2.atari --env.name breakout python main.py --async --config config.muzero.atari --env.name qbert

Expected behavior RayOutOfMemoryError

Screenshots

스크린샷 2022-05-30 오후 6 46 40 스크린샷 2022-05-30 오후 5 07 28

Development Env. (OS, version, libraries): Linux python 3.7.11 jorldy:0.3.0

Additional context Add any other context about the problem here. https://stackoverflow.com/questions/60175137/out-of-memory-with-ray-python-framework https://github.com/ray-project/ray/issues/5572

It seems that GC for ray shared memory doesn't work properly.

kan-s0 commented 2 years ago

It seems that the cause of this issue is not "RayOutOfMemoryError", but the size of the replay buffer is too large.

스크린샷 2022-05-31 오전 10 48 16

RayOutOfMemoryError occurs when learning with seq=20, n_step=3, burn_in=10 as in the config above.

However, it was confirmed that if only the buffer size was reduced from 2M -> 0.5M in the same config, it was learned without any problem.

스크린샷 2022-05-31 오전 10 48 36

When I measured the size of one transition stored in the buffer, it is approximately 5MB in size.

스크린샷 2022-05-31 오전 10 53 17

In the end, this issue seems to be caused by the size of the transition and buffer, so close it.