microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 215 forks source link

EOF error while running the rtd.sh script #139

Open BartWesthoff opened 11 months ago

BartWesthoff commented 11 months ago

Traceback (most recent call last): File "C:\Users\bartw\PycharmProjects\DeBERTa\DeBERTa\apps\run.py", line 476, in main(args) File "C:\Users\bartw\PycharmProjects\DeBERTa\DeBERTa\apps\run.py", line 317, in main train_model(args, model, device, train_data, eval_data, run_eval_fn, loss_fn=loss_fn, train_fn = train_fn) File "C:\Users\bartw\PycharmProjects\DeBERTa\DeBERTa\apps\run.py", line 109, in train_model train_fn(args, model, device, data_fn = data_fn, eval_fn = eval_fn, loss_fn = loss_fn) File "C:\Users/bartw/PycharmProjects/DeBERTa\DeBERTa\apps\tasks\rtd_task.py", line 269, in train_fn trainer.train() File "C:\Users/bartw/PycharmProjects/DeBERTa\DeBERTa\training\trainer.py", line 142, in train for step, batch in enumerate(AsyncDataLoader(train_dataloader, 100)): File "C:\Users/bartw/PycharmProjects/DeBERTa\DeBERTa\data\async_data.py", line 18, in iter dl=iter(self.dataloader) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 442, in iter return self._get_iterator() File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in init w.start() File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init reduction.dump(process_obj, to_child) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'RTDTask.get_feature_fn.._example_to_feature' Traceback (most recent call last): File "", line 1, in File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\bartw\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

Does anyone know how to fix this issue?

stvhuang commented 8 months ago

I got this exception when running out of gpu memory.