Closed agemagician closed 7 years ago
I have this issue too.
The same issue again if I used the "wsj_parsing_tokens_32k".
Result:
INFO:tensorflow:Generating training data for wsj_parsing_tokens_32k.
INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-nc-v11.tgz
INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.en
INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.de
INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-commoncrawl.tgz
INFO:tensorflow:Reading file: commoncrawl.de-en.en
INFO:tensorflow:Reading file: commoncrawl.de-en.de
INFO:tensorflow:Reading file: commoncrawl.fr-en.en
INFO:tensorflow:Reading file: commoncrawl.fr-en.fr
INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-europarl-v7.tgz
INFO:tensorflow:Reading file: training/europarl-v7.de-en.en
INFO:tensorflow:Reading file: training/europarl-v7.de-en.de
INFO:tensorflow:Reading file: training/europarl-v7.fr-en.en
INFO:tensorflow:Reading file: training/europarl-v7.fr-en.fr
INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-giga-fren.tar
INFO:tensorflow:Reading file: giga-fren.release2.fixed.en.gz
INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.en.gz already exists, skipping unpacking
INFO:tensorflow:Reading file: giga-fren.release2.fixed.fr.gz
INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.fr.gz already exists, skipping unpacking
INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-un.tgz
INFO:tensorflow:Reading file: un/undoc.2000.fr-en.en
INFO:tensorflow:Reading file: un/undoc.2000.fr-en.fr
INFO:tensorflow:Alphabet contains 244 characters
INFO:tensorflow:Trying min_count 500
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 2805
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 1411
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 1518
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 1499
[591, 290, 339, 48, 233, 739, 896, 10, 113, 5, 754, 1304, 318, 1465, 730, 1325, 1428, 573, 151, 31, 4]
['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 250
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 5016
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 2303
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 2444
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 2404
[331, 1220, 416, 42, 112, 2137, 1901, 2174, 67, 5, 432, 2209, 676, 2370, 421, 2230, 2333, 867, 169, 30, 4]
['This', 'sen', 'ten', 'ce', 'was', 'enco', 'ded', '', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 125
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 9156
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 3767
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 3984
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 3942
[171, 3290, 589, 88, 3414, 1473, 59, 5, 222, 3747, 1423, 350, 2045, 862, 400, 28, 3]
['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Su', 'b', 'word', 'Te', 'xt', 'En', 'co', 'der', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 62
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 16110
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 6194
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 6495
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 6429
[145, 1751, 1086, 61, 84, 976, 6062, 14, 54, 4, 487, 6234, 1204, 6395, 509, 3778, 705, 519, 28, 3]
['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'xt', 'En', 'co', 'der', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 31
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 26981
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 9956
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 10305
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 10256
[130, 2607, 1851, 74, 4634, 1945, 52, 4, 3617, 2494, 10222, 5345, 10185, 1034, 7129, 39, 3]
['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Tex', 't', 'En', 'cod', 'er', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 15
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 44225
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 15912
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 16370
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 16302
[118, 9925, 552, 70, 6612, 2242, 48, 4, 3955, 3409, 16268, 10338, 1832, 6639, 40, 3]
['This', 'sente', 'nce', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Text', 'En', 'code', 'r', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 7
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 71748
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 25276
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 25830
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 25747
[112, 12304, 25517, 65, 14571, 3782, 45, 4, 5370, 18085, 15003, 25676, 3039, 17012, 53, 3]
['This', 'sentence', '', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'En', 'code', 'r', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 3
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 118656
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 40416
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 41107
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 41021
[107, 15099, 61, 24687, 23, 41, 4, 14183, 17470, 26262, 11280, 19526, 94, 3]
['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Text', 'En', 'code', 'r', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 5
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 87819
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 30533
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 31215
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 31106
[110, 20492, 63, 15371, 4407, 44, 4, 6307, 14237, 12073, 31035, 30266, 22, 3]
['This', 'sentence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._']
This sentence was encoded by the SubwordTextEncoder.
INFO:tensorflow:Trying min_count 4
INFO:tensorflow:Iteration 0
INFO:tensorflow:vocab_size = 101029
INFO:tensorflow:Iteration 1
INFO:tensorflow:vocab_size = 34630
INFO:tensorflow:Iteration 2
INFO:tensorflow:vocab_size = 35402
INFO:tensorflow:Iteration 3
INFO:tensorflow:vocabsize = 35268
[109, 17333, 62, 30348, 22, 44, 4, 19392, 12658, 10909, 35197, 25188, 20, 3]
['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._']
This sentence was encoded by the SubwordTextEncoder.
Traceback (most recent call last):
File "/usr/local/bin/t2t-datagen", line 378, in
@agemagician @ZhenYangIACAS As a temporary workaround, follow these steps. I would suggesting waiting for a contributor to answer, but this works for me:
Clone tensor2tensor
git clone https://github.com/tensorflow/tensor2tensor.git
cd tensor2tensor
Change lines 333-341
if subtokenizer.vocab_size > target_size:
other_subtokenizer = bisect(present_count + 1, max_val)
else:
other_subtokenizer = bisect(min_val, present_count - 1)
return subtokenizer
if (abs(other_subtokenizer.vocab_size - target_size) < abs(subtokenizer.vocab_size - target_size)): return other_subtokenizer return subtokenizer
sudo pip uninstall tensor2tensor
sudo pip install .
This is hopefully corrected in 1.0.11 (as above) please give it a try. I'm closing for now, please reopen if you still see the issue.
Hardware: CPU:Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz Ram: 8 GB GPU: GeForce GT 740M
Software: Ubuntu 16 Tensorflow GPU Version: 1.2.1
I am trying to follow the walk-through tutorial however, during data generation phase I receive the following error: AttributeError: 'NoneType' object has no attribute 'vocab_size'
Command:
Generate data
t2t-datagen \ --data_dir=$DATA_DIR \ --tmp_dir=$TMP_DIR \ --num_shards=100 \ --problem=$PROBLEM
Result: [sudo] password for agemagician: INFO:tensorflow:Generating training data for wmt_ende_tokens_32k. INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-nc-v11.tgz INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.en INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.de INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-commoncrawl.tgz INFO:tensorflow:Reading file: commoncrawl.de-en.en INFO:tensorflow:Reading file: commoncrawl.de-en.de INFO:tensorflow:Reading file: commoncrawl.fr-en.en INFO:tensorflow:Reading file: commoncrawl.fr-en.fr INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-europarl-v7.tgz INFO:tensorflow:Reading file: training/europarl-v7.de-en.en INFO:tensorflow:Reading file: training/europarl-v7.de-en.de INFO:tensorflow:Reading file: training/europarl-v7.fr-en.en INFO:tensorflow:Reading file: training/europarl-v7.fr-en.fr INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-giga-fren.tar INFO:tensorflow:Reading file: giga-fren.release2.fixed.en.gz INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.en.gz already exists, skipping unpacking INFO:tensorflow:Reading file: giga-fren.release2.fixed.fr.gz INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.fr.gz already exists, skipping unpacking INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-un.tgz INFO:tensorflow:Reading file: un/undoc.2000.fr-en.en INFO:tensorflow:Reading file: un/undoc.2000.fr-en.fr INFO:tensorflow:Alphabet contains 244 characters INFO:tensorflow:Trying min_count 500 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 2805 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 1411 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 1518 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 1499 [591, 290, 339, 48, 233, 739, 896, 10, 113, 5, 754, 1312, 318, 1264, 730, 1258, 1317, 573, 151, 31, 4] ['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 250 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 5016 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 2303 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 2444 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 2404 [331, 1220, 416, 42, 112, 2137, 1901, 2187, 67, 5, 432, 2217, 676, 2169, 421, 2163, 2222, 867, 169, 30, 4] ['This', 'sen', 'ten', 'ce', 'was', 'enco', 'ded', '', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 125 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 9156 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 3767 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 3984 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 3942 [171, 3290, 589, 88, 3414, 1473, 59, 5, 222, 3755, 1423, 350, 2045, 862, 400, 28, 3] ['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Su', 'b', 'word', 'Te', 'xt', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 62 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 16110 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 6194 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 6495 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 6429 [145, 1751, 1086, 61, 84, 976, 6062, 14, 54, 4, 487, 6242, 1204, 6194, 509, 3778, 705, 519, 28, 3] ['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'xt', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 31 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 26981 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 9956 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 10305 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 10256 [130, 2607, 1851, 74, 4634, 1945, 52, 4, 3617, 2494, 10021, 5345, 10074, 1034, 7129, 39, 3] ['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Tex', 't', 'En', 'cod', 'er', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 15 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 44225 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 15912 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 16370 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 16302 [118, 9925, 552, 70, 6612, 2242, 48, 4, 3955, 3409, 16067, 10338, 1832, 6639, 40, 3] ['This', 'sente', 'nce', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Text', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 7 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 71748 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 25276 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 25830 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 25747 [112, 12304, 25530, 65, 14571, 3782, 45, 4, 5370, 18085, 15003, 25565, 3039, 17012, 53, 3] ['This', 'sentence', '', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 3 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 118656 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 40416 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 41107 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 41021 [107, 15099, 61, 24687, 23, 41, 4, 14183, 17470, 26262, 11280, 19526, 94, 3] ['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Text', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 5 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 87819 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 30533 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 31215 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 31106 [110, 20492, 63, 15371, 4407, 44, 4, 6307, 14237, 12073, 30924, 30266, 22, 3] ['This', 'sentence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 4 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 101029 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 34630 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 35402 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 35268 [109, 17333, 62, 30348, 22, 44, 4, 19392, 12658, 10909, 35086, 25188, 20, 3] ['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. Traceback (most recent call last): File "/usr/local/bin/t2t-datagen", line 378, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-datagen", line 361, in main
training_gen(), FLAGS.problem + UNSHUFFLED_SUFFIX + "-train",
File "/usr/local/bin/t2t-datagen", line 151, in
lambda: wmt.ende_wordpiece_token_generator(FLAGS.tmp_dir, True, 2**15),
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/wmt.py", line 230, in ende_wordpiece_token_generator
tmp_dir, "tokens.vocab.%d" % vocab_size, vocab_size)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 265, in get_or_generate_vocab
vocab_size, tokenizer.token_counts, 1, 1e3)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 329, in build_to_target_size
return bisect(min_val, max_val)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect
other_subtokenizer = bisect(min_val, present_count - 1)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect
other_subtokenizer = bisect(min_val, present_count - 1)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect
other_subtokenizer = bisect(min_val, present_count - 1)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect
other_subtokenizer = bisect(min_val, present_count - 1)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect
other_subtokenizer = bisect(min_val, present_count - 1)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect
other_subtokenizer = bisect(min_val, present_count - 1)
File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 324, in bisect
if (abs(other_subtokenizer.vocab_size - target_size) <
AttributeError: 'NoneType' object has no attribute 'vocab_size'
Any idea how can I fix it ?