kakaoenterprise / JORLDY

Repository for Open Source Reinforcement Learning Framework JORLDY
Apache License 2.0
359 stars 50 forks source link

single_train process gets stuck #28

Closed zenoengine closed 2 years ago

zenoengine commented 2 years ago

Describe the bug When doing run command line to run single_train process, DQN Agent's network.load_state_dict function the code gets stuck

To Reproduce

python single_train.py --config config.dqn.cartpole

Expected behavior What I expected that single_train process runs without hanging.

Screenshots None

Development Env. (OS, version, libraries):

OS: Ubuntu 18.04.5 LTS Is CUDA available: No Python : 3.6.9 libraries: just used requirements

Additional context

Attaching GDB to the process after the hang

libpthread.so.0!futex_abstimed_wait_cancelable(int private, const struct timespec * abstime, unsigned int expected, unsigned int * futex_word) (/build/glibc-S9d2JN/glibc-2.27/sysdeps/unix/sysv/linux/futex-internal.h:205)
libpthread.so.0!do_futex_wait(struct new_sem * sem, const struct timespec * abstime) (/build/glibc-S9d2JN/glibc-2.27/nptl/sem_waitcommon.c:111)
libpthread.so.0!__new_sem_wait_slow(struct new_sem * sem, const struct timespec * abstime) (/build/glibc-S9d2JN/glibc-2.27/nptl/sem_waitcommon.c:181)

I think It's same issue pytorch/issue/35472

ramanuzan commented 2 years ago

This is an issue I really wanted to solve, thank you very much. We will also update other scripts based on that code.