raymondhs / fairseq-laser

My implementation of LASER architecture in Fairseq
MIT License
12 stars 6 forks source link

RuntimeWarning: overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims) #4

Closed ever4244 closed 4 years ago

ever4244 commented 4 years ago

Hi:

When I run bucc.sh No test file in the dataset, and F score is 0. Should I copy the de-en.sample.en and rename it to de-en.test.en?

It seems also encount an overflow problem:

RuntimeWarning: overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims) I train the model with FP32, while the original script use FP16 in training, is there any problem with that.

log:

=========================================================================

=========================================================================

`- extract from tar bucc2018-de-en.sample-gold.tar.bz2

  • extract from tar bucc2018-de-en.training-gold.tar.bz2
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.dev in en
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.dev in de
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train in en
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train in de
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.test in en cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.en: No such file or directory cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.en: No such file or directory
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.test in de cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.de: No such file or directory cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.de: No such file or directory Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ... Read 677693430 words (40248 unique) from vocabulary file. Loading codes from europarl_en_de_es_fr/bpe.40k/codes ... Read 40000 codes from the codes file. Namespace(all_gather_list_size=16384, beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.de', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='de', target_lang='en', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0) | loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt | Sentence buffer size: 2000 Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ... Read 677693430 words (40248 unique) from vocabulary file. Loading codes from europarl_en_de_es_fr/bpe.40k/codes ... Read 40000 codes from the codes file. Namespace(all_gather_list_size=16384, beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.en', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='en', target_lang='es', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0) | loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt | Sentence buffer size: 2000 LASER: tool to search, score or mine bitexts
  • knn will run on all available GPUs (recommended)
  • loading texts /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.txt.de: 413869 lines, 412909 unique
  • loading texts /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.txt.en: 399337 lines, 397151 unique
  • Embeddings: /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.de, 413869x1024
  • unify embeddings: 413869 -> 412909
  • Embeddings: /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.en, 399337x1024
  • unify embeddings: 399337 -> 397151
  • perform 4-nn source against target /4tssd/wliax/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py:151: RuntimeWarning: overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims)
  • perform 4-nn target against source
  • mining for parallel data
  • scoring 412909 candidates
  • scoring 397151 candidates
  • writing alignments to /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.candidates.tsv LASER: tools for BUCC bitext mining
  • reading sentences and IDs
  • reading candidates /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.candidates.tsv
  • optimizing threshold on gold alignments /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.training.gold
  • best threshold=0.000000: precision=0.00, recall=0.00, F1=0.00`
ever4244 commented 4 years ago

When I run the official LASER bucc.sh :

LASER: tools for BUCC bitext mining

raymondhs commented 4 years ago

This evaluation is done on BUCC training set, so the test file is unused and the "file not found" error can be ignored.

I am not sure though why the overflow occurred. Can you load the embedding files and check if there are NaN values?

import numpy as np
for lang in ["en","de"]:
  X = np.fromfile("./embed/bucc2018.de-en.train.enc.{}".format(lang), dtype=np.float32, count=-1)
  print(np.isnan(X).any())
ever4244 commented 4 years ago

This evaluation is done on BUCC training set, so the test file is unused and the "file not found" error can be ignored.

I am not sure though why the overflow occurred. Can you load the embedding files and check if there are NaN values?

import numpy as np
for lang in ["en","de"]:
  X = np.fromfile("./embed/bucc2018.de-en.train.enc.{}".format(lang), dtype=np.float32, count=-1)
  print(np.isnan(X).any())

There are two NaN values. True True

Would it cause the entire F-score go to 0? I suspect there might be some problem when I train the embedding with FP32 while your script train with FP16 would this discrepancy cause problems?

raymondhs commented 4 years ago

That means the embedding matrix for both en and de have NaN's and this will affect the KNN search computation. I am not sure if it is related to FP16 training, I have never seen this behaviour when using Fairseq in either fp32/fp16 as both usually worked fine.

ever4244 commented 4 years ago

This is the log I run the same model on another machine:

it has an extra warning, maybe it can give you some insight:

opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [17] is 17 which does not match the computed number of elements 18. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (18,).

Processing BUCC data in /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data
- extract from tar bucc2018-fr-en.sample-gold.tar.bz2
- extract from tar bucc2018-fr-en.training-gold.tar.bz2
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.dev in en
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.dev in fr
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train in en
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train in fr
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.test in en
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/fr-en/fr-en.test.en: No such file or directory
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/fr-en/fr-en.test.en: No such file or directory
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.test in fr
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/fr-en/fr-en.test.fr: No such file or directory
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/fr-en/fr-en.test.fr: No such file or directory
Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ...
Read 677693430 words (40248 unique) from vocabulary file.
Loading codes from europarl_en_de_es_fr/bpe.40k/codes ...
Read 40000 codes from the codes file.
Namespace(beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.enc.fr', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='fr', target_lang='en', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0)
| [de] dictionary: 40252 types
| [en] dictionary: 40252 types
| [es] dictionary: 40252 types
| [fr] dictionary: 40252 types
| loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt
| Sentence buffer size: 2000
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [17] is 17 which does not match the computed number of elements 18. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (18,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [18] is 18 which does not match the computed number of elements 19. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (19,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [19] is 19 which does not match the computed number of elements 20. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (20,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [20] is 20 which does not match the computed number of elements 21. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (21,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [21] is 21 which does not match the computed number of elements 22. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (22,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [22] is 22 which does not match the computed number of elements 23. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (23,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [23] is 23 which does not match the computed number of elements 24. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (24,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [24] is 24 which does not match the computed number of elements 25. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (25,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [25] is 25 which does not match the computed number of elements 26. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (26,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [26] is 26 which does not match the computed number of elements 27. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (27,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [27] is 27 which does not match the computed number of elements 29. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (29,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [29] is 29 which does not match the computed number of elements 30. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (30,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [30] is 30 which does not match the computed number of elements 33. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (33,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [33] is 33 which does not match the computed number of elements 36. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (36,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [36] is 36 which does not match the computed number of elements 61. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (61,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [61] is 61 which does not match the computed number of elements 65. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (65,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [65] is 65 which does not match the computed number of elements 72. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (72,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [72] is 72 which does not match the computed number of elements 78. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (78,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [78] is 78 which does not match the computed number of elements 94. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (94,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [94] is 94 which does not match the computed number of elements 136. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (136,).
Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ...
Read 677693430 words (40248 unique) from vocabulary file.
Loading codes from europarl_en_de_es_fr/bpe.40k/codes ...
Read 40000 codes from the codes file.
Namespace(beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.enc.en', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='en', target_lang='es', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0)
| [de] dictionary: 40252 types
| [en] dictionary: 40252 types
| [es] dictionary: 40252 types
| [fr] dictionary: 40252 types
| loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt
| Sentence buffer size: 2000
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [16] is 16 which does not match the computed number of elements 18. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (18,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [18] is 18 which does not match the computed number of elements 19. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (19,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [19] is 19 which does not match the computed number of elements 20. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (20,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [20] is 20 which does not match the computed number of elements 21. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (21,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [21] is 21 which does not match the computed number of elements 22. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (22,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [22] is 22 which does not match the computed number of elements 23. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (23,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [23] is 23 which does not match the computed number of elements 24. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (24,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [24] is 24 which does not match the computed number of elements 25. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (25,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [25] is 25 which does not match the computed number of elements 26. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (26,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [26] is 26 which does not match the computed number of elements 27. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (27,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [27] is 27 which does not match the computed number of elements 28. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (28,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [28] is 28 which does not match the computed number of elements 30. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (30,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [30] is 30 which does not match the computed number of elements 33. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (33,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [33] is 33 which does not match the computed number of elements 37. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (37,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [37] is 37 which does not match the computed number of elements 59. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (59,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [59] is 59 which does not match the computed number of elements 71. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (71,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [71] is 71 which does not match the computed number of elements 85. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (85,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [85] is 85 which does not match the computed number of elements 95. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (95,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [95] is 95 which does not match the computed number of elements 107. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (107,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [107] is 107 which does not match the computed number of elements 202. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (202,).
LASER: tool to search, score or mine bitexts
- knn will run on all available GPUs (recommended)
- loading texts /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.txt.fr: 271874 lines, 270775 unique
- loading texts /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.txt.en: 369810 lines, 368033 unique
- Embeddings: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.enc.fr, 271874x1024
- unify embeddings: 271874 -> 270775
- Embeddings: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.enc.en, 369810x1024
- unify embeddings: 369810 -> 368033
- perform 4-nn source against target
/home/wei/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py:151: RuntimeWarning: overflow encountered in reduce
ret = umr_sum(arr, axis, dtype, out, keepdims)
- perform 4-nn target against source
- mining for parallel data
- scoring 270775 candidates
- scoring 368033 candidates
- writing alignments to /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.candidates.tsv
LASER: tools for BUCC bitext mining
- reading sentences and IDs
- reading candidates /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.fr-en.train.candidates.tsv
- optimizing threshold on gold alignments /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/fr-en/fr-en.training.gold
- best threshold=0.000000: precision=0.00, recall=0.00, F1=0.00
- extract from tar bucc2018-de-en.sample-gold.tar.bz2
- extract from tar bucc2018-de-en.training-gold.tar.bz2
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.dev in en
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.dev in de
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train in en
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train in de
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.test in en
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/de-en/de-en.test.en: No such file or directory
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/de-en/de-en.test.en: No such file or directory
- extract files /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.test in de
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/de-en/de-en.test.de: No such file or directory
cat: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/de-en/de-en.test.de: No such file or directory
Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ...
Read 677693430 words (40248 unique) from vocabulary file.
Loading codes from europarl_en_de_es_fr/bpe.40k/codes ...
Read 40000 codes from the codes file.
Namespace(beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.de', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='de', target_lang='en', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0)
| [de] dictionary: 40252 types
| [en] dictionary: 40252 types
| [es] dictionary: 40252 types
| [fr] dictionary: 40252 types
| loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt
| Sentence buffer size: 2000
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [17] is 17 which does not match the computed number of elements 19. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (19,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [19] is 19 which does not match the computed number of elements 21. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (21,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [21] is 21 which does not match the computed number of elements 22. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (22,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [22] is 22 which does not match the computed number of elements 23. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (23,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [23] is 23 which does not match the computed number of elements 24. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (24,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [24] is 24 which does not match the computed number of elements 25. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (25,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [25] is 25 which does not match the computed number of elements 26. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (26,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [26] is 26 which does not match the computed number of elements 27. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (27,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [27] is 27 which does not match the computed number of elements 28. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (28,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [28] is 28 which does not match the computed number of elements 30. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (30,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [30] is 30 which does not match the computed number of elements 31. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (31,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [31] is 31 which does not match the computed number of elements 33. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (33,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [33] is 33 which does not match the computed number of elements 36. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (36,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [36] is 36 which does not match the computed number of elements 40. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (40,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [40] is 40 which does not match the computed number of elements 65. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (65,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [65] is 65 which does not match the computed number of elements 76. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (76,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [76] is 76 which does not match the computed number of elements 107. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (107,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [107] is 107 which does not match the computed number of elements 110. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (110,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [110] is 110 which does not match the computed number of elements 114. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (114,).
Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ...
Read 677693430 words (40248 unique) from vocabulary file.
Loading codes from europarl_en_de_es_fr/bpe.40k/codes ...
Read 40000 codes from the codes file.
Namespace(beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.en', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='en', target_lang='es', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0)
| [de] dictionary: 40252 types
| [en] dictionary: 40252 types
| [es] dictionary: 40252 types
| [fr] dictionary: 40252 types
| loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt
| Sentence buffer size: 2000
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [16] is 16 which does not match the computed number of elements 18. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (18,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [18] is 18 which does not match the computed number of elements 19. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (19,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [19] is 19 which does not match the computed number of elements 20. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (20,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [20] is 20 which does not match the computed number of elements 21. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (21,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [21] is 21 which does not match the computed number of elements 22. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (22,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [22] is 22 which does not match the computed number of elements 23. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (23,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [23] is 23 which does not match the computed number of elements 24. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (24,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [24] is 24 which does not match the computed number of elements 25. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (25,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [25] is 25 which does not match the computed number of elements 26. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (26,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [26] is 26 which does not match the computed number of elements 27. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (27,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [27] is 27 which does not match the computed number of elements 28. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (28,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [28] is 28 which does not match the computed number of elements 29. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (29,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [29] is 29 which does not match the computed number of elements 32. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (32,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [32] is 32 which does not match the computed number of elements 36. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (36,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [36] is 36 which does not match the computed number of elements 63. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (63,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [63] is 63 which does not match the computed number of elements 64. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (64,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [64] is 64 which does not match the computed number of elements 105. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (105,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [105] is 105 which does not match the computed number of elements 133. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (133,).
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/RangeFactories.cpp:170: UserWarning: The number of elements in the out tensor of shape [133] is 133 which does not match the computed number of elements 164. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (164,).
LASER: tool to search, score or mine bitexts
- knn will run on all available GPUs (recommended)
- loading texts /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.txt.de: 413869 lines, 412909 unique
- loading texts /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.txt.en: 399337 lines, 397151 unique
- Embeddings: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.de, 413869x1024
- unify embeddings: 413869 -> 412909
- Embeddings: /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.en, 399337x1024
- unify embeddings: 399337 -> 397151
- perform 4-nn source against target
/home/wei/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py:151: RuntimeWarning: overflow encountered in reduce
ret = umr_sum(arr, axis, dtype, out, keepdims)
- perform 4-nn target against source
- mining for parallel data
- scoring 412909 candidates
- scoring 397151 candidates
- writing alignments to /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.candidates.tsv
LASER: tools for BUCC bitext mining
- reading sentences and IDs
- reading candidates /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/embed/bucc2018.de-en.train.candidates.tsv
- optimizing threshold on gold alignments /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/bucc_data/bucc2018/de-en/de-en.training.gold
- best threshold=0.000000: precision=0.00, recall=0.00, F1=0.00
raymondhs commented 4 years ago

The warnings about the number of elements look like a mismatch (current and next input line). Maybe you can try with a smaller portion of training set (1 sentence, etc.) and see if the embeddings are generated without any NaN's. Also, make sure that the data has been preprocessed correctly (can check by saving it into a temp file).

ever4244 commented 4 years ago

The warnings about the number of elements look like a mismatch (current and next input line). Maybe you can try with a smaller portion of training set (1 sentence, etc.) and see if the embeddings are generated without any NaN's. Also, make sure that the data has been preprocessed correctly (can check by saving it into a temp file).

Hi, I solved the problem by reinstall everything, including anaconda and driver.

How many epoch would you train for this model? My GPU memory can only support 5000 max token If I train with FP16, then the gradient will overflow after 3 epochs. So I have to set --update-freq 4. and --fp16-scale-tolerance=0.01

What is the max F score and precision you can get?

Mine is:

  • optimizing threshold on gold alignments ./bucc2018/fr-en/fr-en.training.gold
    • best threshold=1.024506: precision=89.04, recall=86.32, F1=87.66

optimizing threshold on gold alignments ./bucc2018/de-en/de-en.training.gold

  • best threshold=1.024628: precision=92.84, recall=86.84, F1=89.74

fairseq-train $data_bin \ --max-epoch 10 \ --ddp-backend=no_c10d \ --task translation_laser --arch laser \ --lang-pairs de-en,de-es,en-es,es-en,fr-en,fr-es \ --optimizer adam --adam-betas '(0.9, 0.98)' \ --lr 0.001 --criterion cross_entropy \ --dropout 0.1 --save-dir $checkpoint \ --max-tokens 5000 \ --valid-subset train --disable-validation \ --no-progress-bar --log-interval 1000 \ --user-dir laser/ --update-freq 4

2020-02-29 05:13:02 | INFO | train | epoch 010: 11999 / 16140 loss=2.93, ppl=7.62, wps=13518.8, ups=0.63, wpb=21304.9, bsz=703.9, num_updates=419365, lr=1e-09, gnorm=0.371, clip=0, oom=0.0, loss_scale=0, train_wall=17552, wall=0 2020-02-29 05:15:05 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:19:17 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:31:15 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:38:34 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:39:06 | INFO | train | epoch 010: 12999 / 16140 loss=2.93, ppl=7.62, wps=13520.6, ups=0.63, wpb=21301.5, bsz=703.5, num_updates=420361, lr=1e-09, gnorm=0.365, clip=0, oom=0.0, loss_scale=0, train_wall=19002, wall=0 2020-02-29 05:40:45 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:46:36 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:53:03 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:55:38 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 05:59:08 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:02:09 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:05:26 | INFO | train | epoch 010: 13999 / 16140 loss=2.93, ppl=7.62, wps=13514.8, ups=0.63, wpb=21305.5, bsz=703.6, num_updates=421355, lr=1e-09, gnorm=0.361, clip=0, oom=0.0, loss_scale=0, train_wall=20472, wall=0 2020-02-29 06:12:28 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:16:09 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:16:37 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:23:26 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:27:59 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:31:33 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:31:39 | INFO | train | epoch 010: 14999 / 16140 loss=2.931, ppl=7.62, wps=13510, ups=0.63, wpb=21303.9, bsz=703.3, num_updates=422349, lr=1e-09, gnorm=0.359, clip=0, oom=0.0, loss_scale=0, train_wall=21938, wall=0 2020-02-29 06:33:49 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:33:51 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:44:52 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:45:07 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:49:11 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:52:46 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 06:57:56 | INFO | train | epoch 010: 15999 / 16140 loss=2.931, ppl=7.62, wps=13502.4, ups=0.63, wpb=21300.3, bsz=702.9, num_updates=423343, lr=1e-09, gnorm=0.355, clip=0, oom=0.0, loss_scale=0, train_wall=23411, wall=0 2020-02-29 07:01:18 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125 2020-02-29 07:01:41 | INFO | train | epoch 010 | loss 2.93 | ppl 7.62 | wps 13498.9 | ups 0.63 | wpb 21299.2 | bsz 702.9 | num_updates 423482 | lr 1e-09 | gnorm 0.355 | clip 0 | oom 0.0 | loss_scale 0 | train_wall 23621 | wall 0 2020-02-29 07:01:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints/laser_lstm/checkpoint10.pt (epoch 10 @ 423482 updates, score None) (writing took 8.07022743113339 seconds) 2020-02-29 07:01:49 | INFO | fairseq_cli.train | done training in 152230.1 seconds

ever4244 commented 4 years ago

This is the bucc test at epoch3, it seems that the performance gain over the next 7 steps is very minor. Due to the gradient overflow.

  • optimizing threshold on gold alignments ./bucc2018/fr-en/fr-en.training.gold

    • best threshold=1.026297: precision=87.22, recall=86.95, F1=87.08
  • optimizing threshold on gold alignments ./bucc2018/de-en/de-en.training.gold

    • best threshold=1.024628: precision=92.84, recall=86.84, F1=89.74

At epoch5:

  • optimizing threshold on gold alignments ./bucc2018/fr-en/fr-en.training.gold

  • best threshold=1.024505: precision=89.04, recall=86.32, F1=87.66

  • optimizing threshold on gold alignments ./bucc2018/de-en/de-en.training.gold

  • best threshold=1.019340: precision=92.43, recall=88.54, F1=90.44

raymondhs commented 4 years ago

I used the training command (10 epochs) and got the results listed in the README. My effective batch size is (4 GPUs) (max-tokens=12000) (update-freq=1) = 48,000. You can also try to make yours to similar number, according to your GPU memory and how many GPUs you have. I also had some warnings related to fp16 overflow. The warning means that your loss cannot fit in fp16, hence it will be scaled (this is described in fairseq's paper, it's called dynamic loss scaling). So I just left --fp16-scale-tolerance to default value.

ever4244 commented 4 years ago

I used the training command (10 epochs) and got the results listed in the README. My effective batch size is (4 GPUs) (max-tokens=12000) (update-freq=1) = 48,000. You can also try to make yours to similar number, according to your GPU memory and how many GPUs you have. I also had some warnings related to fp16 overflow. The warning means that your loss cannot fit in fp16, hence it will be scaled (this is described in fairseq's paper, it's called dynamic loss scaling). So I just left --fp16-scale-tolerance to default value.

Yes. However, due to my GPU memory is 11GB per card (I have 1080ti and 2080ti) I have to cut the Max token to 5000. As my batch size is smaller than yours, it is very likely that I get the fp16 overflow much earlier than yours. After 3 epoch, my gradient overflow is so serious that scaling to minimum lr cannot cure the overflow problem and the program will be automatically aborted.

So I am currently trying to train it with fp32 and hoping it will cure the overflow problem a little bit.

As for (4 GPUs) * (max-tokens=12000) effective batch, my previous understanding is the gradient overflow is not affected by the number of GPUs you have, as each batch is updated independently in each GPU. Are you saying that if I have more GPUs, I can somewhat cure the overflow problem by having a larger effective batch as the gradient in each different GPUs card can be collected into a larger batch and updated together? If that is the case then I may be able to try with more cards.

Anyway, I think the problem in this thread is solved, Than you very much!

raymondhs commented 4 years ago

Yes, the batch size is related to the number of GPUs, max-tokens, and update-freq. Just multiply these 3 to get your final batch size. For example, you can try with 1 GPU with --max-tokens 5000 --update-freq 10 (effectively 50,000 tokens) or 2 GPUs with --max-tokens 5000 --update-freq 5 (effectively also 50,000). Batch size is one of the hyperparameters so we need to do a trial and error to find a good one that works well. I am not sure if it will fix the overflow though, since there may be other factors, like hardware differences.

ever4244 commented 4 years ago

Yes, the batch size is related to the number of GPUs, max-tokens, and update-freq. Just multiply these 3 to get your final batch size. For example, you can try with 1 GPU with --max-tokens 5000 --update-freq 10 (effectively 50,000 tokens) or 2 GPUs with --max-tokens 5000 --update-freq 5 (effectively also 50,000). Batch size is one of the hyperparameters so we need to do a trial and error to find a good one that works well. I am not sure if it will fix the overflow though, since there may be other factors, like hardware differences.

Thank you. I indeed set update-freq to 4 So my current batch is 2GPUX5000X4=40K tokens, and your is 4GPUX12000= 48K tokens, It seems to be similar. Then why I get the overflow problem as early as 4th epochs.

BTW: does update-freq have a negative impact except that it may need larger lr (current time is not the most important problem, I would trade for more epochs for better performances)? next time I will increase this one since it is easiest to increase.

BTW2: I am writing a paper to compare my method with LASER. Do you have any paper that you would like me to cite and related to this training code?

raymondhs commented 4 years ago

No worries, and also thanks for trying this code. If we have a larger update-freq, it would mean we have a larger batch size for each update. This is useful like in your case where you may want to have a larger batch size than what can fit in the GPU RAM. From my previous experience, the fp16 overflow warning often appears during training, but it has never really caused the training to abort for me.. so not really sure for this issue.

Haha I don't really have a paper related to this code, it's only a personal project because the LASER's author has not released their training code yet. Also just a disclaimer that the code in this repo is an approximation based on the description in LASER papers, author's implementation may be different. :-)

ever4244 commented 4 years ago

No worries, and also thanks for trying this code. If we have a larger update-freq, it would mean we have a larger batch size for each update. This is useful like in your case where you may want to have a larger batch size than what can fit in the GPU RAM. From my previous experience, the fp16 overflow warning often appears during training, but it has never really caused the training to abort for me.. so not really sure for this issue.

Haha I don't really have a paper related to this code, it's only a personal project because the LASER's author has not released their training code yet. Also just a disclaimer that the code in this repo is an approximation based on the description in LASER papers, author's implementation may be different. :-)

OK. Thank you for your help and timely response. I also tried other LASER training implementation but never got such helpful responses as yours. I will continue to do some experiments and give feedback here. I think 11GB 2080ti is a very common GPU, so maybe if I can find a more optimal setting it will also help others.

raymondhs commented 4 years ago

Thanks! I'll close this and the previous issue for now as the original problems seem to have been solved. Please feel free to open a new one if you meet other problems.