Closed a2ewmk closed 2 years ago
Hi, I've never seen this error before but we did experiments only on Linux.
However, it seems to be a known issue related to multiprocess dataloading. As recommended here please try setting num_workers parameter to zero: https://github.com/yandex-research/sparqling-queries/blob/e04d0bfd507c4859be3f35d4e0d8eb57434bb4f6/text2qdmr/commands/train.py#L243
Best, Anton
That solution worked. Thank you for the help.
Hi, I'm using this on Windows and was able to run through the initial setup steps including running the preprocess of one of the experiment tests for text2qdmr. Though when I attempt to train the config file, it results in an attribute error. I've tried finding some solutions for it, but haven't been able to get around it. Any help is appreciated, thanks!
(env-torch1.9) D:\GitInstalls\sparqling-queries>python run_text2qdmr.py train ./text2qdmr/configs/experiments/bert_qdmr_train.jsonnet Running with 1 GPU [2022-02-11T14:19:49] Logging to logdir/bert_qdmr_train\bs=6,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1 Some weights of the model checkpoint at bert-large-uncased-whole-word-masking were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']
no_deprecation_warning=True
to disable this warning FutureWarning, Loaded dataset size: 4321 [2022-02-11T14:20:11] Running on git commit 'e04d0bfd507c4859be3f35d4e0d8eb57434bb4f6' [2022-02-11T14:20:12] Result of conda info: [2022-02-11T14:20:12] active environment : env-torch1.9 active env location : C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9 shell level : 2 user config file : C:\Users\a2ewmk.condarc populated config files : C:\Users\a2ewmk.condarc conda version : 4.11.0 conda-build version : 3.21.6 python version : 3.9.7.final.0 virtual packages : cuda=10.2=0 win=0=0 __archspec=1=x86_64 base environment : C:\Users\a2ewmk\Anaconda3 (writable) conda av data dir : C:\Users\a2ewmk\Anaconda3\etc\conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/win-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/win-64 https://repo.anaconda.com/pkgs/r/noarch https://repo.anaconda.com/pkgs/msys2/win-64 https://repo.anaconda.com/pkgs/msys2/noarch package cache : C:\Users\a2ewmk\Anaconda3\pkgs C:\Users\a2ewmk.conda\pkgs C:\Users\a2ewmk\AppData\Local\conda\conda\pkgs envs directories : C:\Users\a2ewmk\Anaconda3\envs C:\Users\a2ewmk.conda\envs C:\Users\a2ewmk\AppData\Local\conda\conda\envs platform : win-64 user-agent : conda/4.11.0 requests/2.26.0 CPython/3.9.7 Windows/10 Windows/10.0.18363 administrator : False netrc file : None offline mode : False[2022-02-11T14:20:12] pytorch version: 1.9.0 [2022-02-11T14:20:12] transformers version: 4.16.2 Traceback (most recent call last): File "run_text2qdmr.py", line 181, in
main()
File "run_text2qdmr.py", line 123, in main
train.main(train_config, distributed=args.distributed)
File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 456, in main
trainer.train(config, modeldir=args.logdir, tb_name=os.path.join('runs_train', args.name))
File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 280, in train
for batch in train_data_loader:
File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 392, in _yield_batches_from_epochs
for batch in loader:
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 354, in iter
self._iterator = self._get_iterator()
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'EncDecModel.Preproc.dataset..'
(env-torch1.9) D:\GitInstalls\sparqling-queries>Running with 1 GPU Traceback (most recent call last): File "", line 1, in
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input