wenwenyu / MASTER-pytorch

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)
https://arxiv.org/abs/1910.02562
MIT License
277 stars 53 forks source link

使用debug模式进行单gpu训练报错,显示堆栈溢出 #9

Closed zezeze97 closed 3 years ago

zezeze97 commented 3 years ago

您好,我在尝试使用debug模式进行单gpu训练时报错,暂时找不到原因? (torch) zhangzr@AI12:~/MASTER-pytorch-main$ python train.py -c configs/config.json -d 1 -dist false [2021-07-11 18:34:01,939 - train - INFO] - One GPU or CPU training mode start... train.py:137: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn('You have chosen to deterministic training. ' [2021-07-11 18:34:01,943 - train - WARNING] - You have chosen to deterministic training. This will fix random seed, turn on the CUDNN deterministic setting, turn off the CUDNN benchmark which can slow down your training considerably! [2021-07-11 18:34:02,568 - train - INFO] - Dataloader instances have finished. Train datasets: 3067 Val datasets: 2 Train_batch_size/gpu: 8 Val_batch_size/gpu: 8. [2021-07-11 18:34:04,382 - train - INFO] - Model created, trainable parameters: 54600257. [2021-07-11 18:34:04,383 - train - INFO] - Optimizer and lr_scheduler created. [2021-07-11 18:34:04,383 - train - INFO] - Max_epochs: 600 Log_step_interval: 1 Validation_step_interval: 2000. [2021-07-11 18:34:04,383 - train - INFO] - Training start... [2021-07-11 18:34:04,412 - trainer - WARNING] - Training is using GPU 0! Fatal Python error: Cannot recover from stack overflow. Python runtime state: initialized

Current thread 0x00007f13c82e3340 (most recent call first): File "/home/zhangzr/anaconda3/envs/torch/lib/python3.8/site-packages/PIL/_util.py", line 6 in isPath File "/home/zhangzr/anaconda3/envs/torch/lib/python3.8/site-packages/PIL/Image.py", line 2964 in open File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 78 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem File "/home/zhangzr/MASTER-pytorch-main/data_utils/datasets.py", line 101 in getitem ... Aborted (core dumped)

huilinghap commented 3 years ago

debug模式进行单gpu训练报错,python train.py -c configs/config.json -d 1 -dist false 可以进行训练,在测试验证集时,val_num_workers=0时,显示错误:Segmentation fault (core dumped); val_num_workers=2(>0)时,显示错误: raise RuntimeError: DataLoader worker (pid(s) 22437) exited unexpectedly 您这边这个问题是怎么解决的?

S-HuaBomb commented 2 years ago

我在训练时也遇到了同样的问题,请问怎么解决呢?@zezeze97