open-mmlab / mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark
https://mmpretrain.readthedocs.io/en/latest/
Apache License 2.0
3.28k stars 1.03k forks source link

[Bug] trian: reduction.pickle.load(from_parent) EOFError: Ran out of input #1628

Open guwuyue opened 1 year ago

guwuyue commented 1 year ago

Branch

main branch (mmpretrain version)

Describe the bug

训练时设置num_workers=0,persistent_workers=False 不会报错,若设置num_workers=1,persistent_workers=True, 就会报如下错误:image

dataloader.py中line436行,self._iterator = self._get_iterator() 跳转到line388行return _MultiProcessingDataLoaderIter(self) 再跳转到line1042行w.start() 就会报如下错误: image

Environment

{'sys.platform': 'win32', 'Python': '3.9.16 (main, Jan 11 2023, 16:16:36) [MSC v.1916 64 bit (AMD64)]', 'CUDA available': True, 'numpy_random_seed': 2147483648, 'GPU 0': 'NVIDIA GeForce RTX 3080 Ti', 'CUDA_HOME': 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8', 'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89', 'MSVC': '用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.29.30148 版', 'GCC': 'n/a', 'PyTorch': '2.0.1+cu118', 'TorchVision': '0.15.2+cu118', 'OpenCV': '4.7.0', 'MMEngine': '0.7.4', 'MMCV': '2.0.0', 'MMPreTrain': '1.0.0rc8+'}

Other information

No response

guwuyue commented 1 year ago

image