kevaday / alphazero-general

A fast, generalized, and modified implementation of Deepmind's distinguished AlphaZero in PyTorch.
MIT License
66 stars 21 forks source link

Getting a segfault after appending agents #3

Closed starovp closed 3 years ago

starovp commented 3 years ago

Hey, just pulled your repo, have a custom game interface. I get a segfault after appending agents, plenty of ram available, ulimit 32k, stack and recursion size maxed out. Worked fine on bhasconnect's original repo - any ideas as to what could be causing it?

kevaday commented 3 years ago

Hello, I haven't encountered this error yet. Can you please show me the full stack trace and possible steps I can take to try and reproduce it? Also, do any of the provided example envs work for you, or do you get the same error?

Regards -kevaday

starovp commented 3 years ago

No worries. I'll try the others, but currently it looks like:

Because of batching, it can take a long time before any games finish.
------ITER 1------
Self play agents
Agents and queue ready
Cuda begin pin
Cuda done
Append agents
Fatal Python error: Segmentation fault

Thread 0x00007f4c8dfff700 (most recent call first):
  File "/home/_/anaconda3/envs/py395/lib/python3.9/selectors.py", line 416 in select
  File "/home/_/anaconda3/envs/py395/lib/python3.9/multiprocessing/connection.py", line 936 in wait
  File "/home/_/anaconda3/envs/py395/lib/python3.9/multiprocessing/connection.py", line 429 in _poll
  File "/home/_/anaconda3/envs/py395/lib/python3.9/multiprocessing/connection.py", line 262 in poll
  File "/home/_/anaconda3/envs/py395/lib/python3.9/multiprocessing/queues.py", line 113 in get
  File "/home/_/anaconda3/envs/py395/lib/python3.9/site-packages/tensorboardX/event_file_writer.py", line 202 in run
  File "/home/_/anaconda3/envs/py395/lib/python3.9/threading.py", line 954 in _bootstrap_inner
  File "/home/_/anaconda3/envs/py395/lib/python3.9/threading.py", line 912 in _bootstrap

Current thread 0x00007f4d5f3ca340 (most recent call first):
Segmentation fault (core dumped)

I added a couple prints to try and debug

starovp commented 3 years ago

Nope, Connect4 loads fine

starovp commented 3 years ago

Whoops! Turns out it was a major recursive error in the init of the game state. My bad.

kevaday commented 3 years ago

Glad you found it. My next suggestion was going to be just that.