rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

DeprecationWarning and ResourceWarning: Enable tracemalloc to get the object allocation traceback #106

Closed RamoramaInteractive closed 2 years ago

RamoramaInteractive commented 2 years ago

I'm working on this Sockeye tutorial: https://awslabs.github.io/sockeye/tutorials/wmt.html

After running the preprocessing command, I have been getting this output for over 30 minutes.

/home/subword-nmt/learn_joint_bpe_and_vocab.py:139: DeprecationWarning: this script's location has moved to /home/subword-nmt/subword_nmt. This symbolic link will be removed in a future version. Please point to the new location, or install the package and use the command 'subword-nmt'
  DeprecationWarning
/home/subword-nmt/learn_joint_bpe_and_vocab.py:90: ResourceWarning: unclosed file <_io.TextIOWrapper name='corpus.tc.en' mode='r' encoding='UTF-8'>
  args.input = [codecs.open(f.name, encoding='UTF-8') for f in args.input]
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/subword-nmt/learn_joint_bpe_and_vocab.py:90: ResourceWarning: unclosed file <_io.TextIOWrapper name='corpus.tc.de' mode='r' encoding='UTF-8'>
  args.input = [codecs.open(f.name, encoding='UTF-8') for f in args.input]
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/subword-nmt/learn_joint_bpe_and_vocab.py:91: ResourceWarning: unclosed file <_io.TextIOWrapper name='bpe.vocab.en' mode='w' encoding='UTF-8'>
  args.vocab = [codecs.open(f.name, 'w', encoding='UTF-8') for f in args.vocab]
ResourceWarning: Enable tracemalloc to get the object allocation traceback

Nothing has been changed yet. According to https://stackoverflow.com/questions/60945317/python-selenium-resourcewarning-enable-tracemalloc-to-get-the-object-allocati it's just a debug tool. Is it normal, that preprocessing the WMT17 data took so long?

I want to make sure that the subword-nmt is working properly.

rsennrich commented 2 years ago

these warnings shouldn't affect the execution of the training script. Yes, depending on the amount of training data, the execution time your report is quite normal. There are C++ implementations of BPE if speed is of the essence, e.g. fastBPE or YouTokenToMe.

rsennrich commented 2 years ago

I now added a progress bar (requiring tqdm) so that there's some feedback whether learn_bpe is still running or not.