chunk = read(handle, remaining)

sunshantong commented 4 years ago

Hi, thank you for your work! The code runs fine in training. But pause for validation. This does not seem to be caused by the "try-except" code in the testdataloader. When I on CTRL+C shutdown I get this:

Traceback (most recent call last):
  File "train.py", line 96, in <module>
    for j, data in enumerate(testdataloader, 0):
  File "/home/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 330, in __next__
    idx, batch = self._get_batch()
  File "/home/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 309, in _get_batch
    return self.data_queue.get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

What might be causing this problem? Thank you very much.

LyuJ1998 commented 4 years ago

Hi, I have fix the bug you mentioned in another issue. Check it and your problem may disappear. If not, please make sure again you have remove all of the "try-excepy" code, especially the line 488 and line 471 in dataset/dataset_nocs.py.

sunshantong commented 4 years ago

@LyuJ1998 Hi, thank you for your reply and update. The dataloader bug has been fixed. But I still have the same problem when I run the code. The code will still stick at testdataloader. When I set num_workers = 0, the code can run without multithreading. But it's very slow. Is it possible deadlock in dataloader? Thank you very much.

sunshantong commented 4 years ago

I fix the problem by calling the main script by _OMP_NUM_THREADS=1 MKL_NUMTHREADS=1 python train.py Hope to be helpful to other similar situations.

mystorm16 commented 2 years ago

Looks like this was a bug in Python i fix this issue by changing my python version ：3.6.0 to 3.6.7 https://stackoverflow.com/questions/53300965/pytorch-exception-in-thread-valueerror-signal-number-32-out-of-range

j96w / 6-PACK

chunk = read(handle, remaining) #23