neulab / awesome-align

A neural word aligner based on multilingual BERT
https://arxiv.org/abs/2101.08231
BSD 3-Clause "New" or "Revised" License
325 stars 47 forks source link

Extracting dataset and AttributeError #44

Closed b3ade closed 2 years ago

b3ade commented 2 years ago

Trying to run py run_align.py --output_file=D:\MT\dataFiles\NO-BA.txt --model_name_or_path=bert-base-multilingual-cased --data_file=D:\MT\dataFiles\NO-BA-output.txt --extraction 'softmax' --batch_size 32 And getting error :

"Loading the dataset...
Extracting: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "run_align.py", line 297, in <module>
    main()
  File "run_align.py", line 294, in main
    word_align(args, model, tokenizer)
  File "run_align.py", line 171, in word_align
    for batch in dataloader:
  File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 368, in __iter__
    return self._get_iterator()
  File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 314, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 927, in __init__
    w.start()
  File "C:\Python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Python38\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Python38\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Python38\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'word_align.<locals>.collate'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Python38\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Extracting: 0it [00:01, ?it/s]"

Not sure what am I doing wrong and how can I run smoothly script, I try couple solutions what I find on internet but without success .

zdou0830 commented 2 years ago

Hi, could you add '--num_workers 0' to the command and see if it works? I'm not sure if this is because you are using Windows.

b3ade commented 2 years ago

It's working. I always have to double the effort because I use Win. Maybe it's time to switch. :) Thanks.

alhuber1502 commented 2 years ago

I encountered the same error on macOS (and the provided solution worked) in case this helps anyone.