rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

BrokenPipeError: [Errno 32] Broken pipe #117

Closed whher closed 2 years ago

whher commented 2 years ago

Hi,

I am running this sockeye tutorial: https://github.com/awslabs/sockeye/blob/main/docs/tutorials/wmt.md

I encountered a problem while translating a sentence. This is the command I entered: echo "ich weiss nicht" | python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 | python -m sockeye.translate -m de-bar-base-model 2>/dev/null | sed -r "s/@@( |$)//g"

And this is the error I received: /home/jovyan/sockeye/subword-nmt/apply_bpe.py:398: DeprecationWarning: this script's location has moved to /home/jovyan/sockeye/subword-nmt/subword_nmt. This symbolic link will be removed in a future version. Please point to the new location, or install the package and use the command 'subword-nmt' warnings.warn( /home/jovyan/sockeye/subword-nmt/apply_bpe.py:420: ResourceWarning: unclosed file <_io.TextIOWrapper name='bpe.codes' mode='r' encoding='UTF-8'> args.codes = codecs.open(args.codes.name, encoding='utf-8') ResourceWarning: Enable tracemalloc to get the object allocation traceback /home/jovyan/sockeye/subword-nmt/apply_bpe.py:426: ResourceWarning: unclosed file <_io.TextIOWrapper name='bpe.vocab.de' mode='r' encoding='UTF-8'> args.vocabulary = codecs.open(args.vocabulary.name, encoding='utf-8') ResourceWarning: Enable tracemalloc to get the object allocation traceback Traceback (most recent call last): File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/jovyan/sockeye/subword-nmt/apply_bpe.py", line 450, in <module> args.output.write(bpe.process_line(line, args.dropout)) BrokenPipeError: [Errno 32] Broken pipe Exception ignored in: <_io.TextIOWrapper name='<stdout>' encoding='utf-8'> BrokenPipeError: [Errno 32] Broken pipe

I understand that this is originating from a BrokenPipeError in apply_bpe.py. Is there any way to counter this? Thank you!

rsennrich commented 2 years ago

there are quite a few warnings that I've now reduced a bit in commit 810ee1487a753870ebf90d91ccdb789158268d9f.

However, the core problem of the broken pipe could also indicate that something is wrong with where the output is written to, i.e. sockeye. Can you try the following:

execute just this command in isolation:

echo "ich weiss nicht" | python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50

execute the full command, but don't write sockeye errors to /dev/null:

echo "ich weiss nicht" | python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 | python -m sockeye.translate -m de-bar-base-model | sed -r "s/@@( |$)//g"

This should give you a better idea what's going on.

whher commented 2 years ago

Thanks for the quick reply, I pulled the newest commit and executed the commands as you mentioned:

Executing only the apply_bpe commands returns the encoded texts: ich wei@@ ss nicht

Executing the full command and without writing to /dev/null returns errors again: Traceback (most recent call last): File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/jovyan/sockeye/sockeye/translate.py", line 25, in <module> import torch as pt ModuleNotFoundError: No module named 'torch' /home/jovyan/sockeye/subword-nmt/apply_bpe.py:393: DeprecationWarning: this script's location has moved to /home/jovyan/sockeye/subword-nmt/subword_nmt. This symbolic link will be removed in a future version. Please point to the new location, or install the package and use the command 'subword-nmt' warnings.warn( Traceback (most recent call last): File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/jovyan/sockeye/subword-nmt/apply_bpe.py", line 446, in <module> args.output.write(bpe.process_line(line, args.dropout)) BrokenPipeError: [Errno 32] Broken pipe Exception ignored in: <_io.TextIOWrapper name='<stdout>' encoding='utf-8'> BrokenPipeError: [Errno 32] Broken pipe

I am deducing that the error occurs when passing the output to sockeye then...?

rsennrich commented 2 years ago

yes, it seems that torch is not correctly installed, but is required by sockeye.

I'll close this issue, since this does not pertain to subword-nmt. I trust you'll find other resources to help with installing/debugging sockeye.