tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.33k stars 3.47k forks source link

AttributeError: 'NoneType' object has no attribute 'vocab_size' #97

Closed agemagician closed 7 years ago

agemagician commented 7 years ago

Hardware: CPU:Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz Ram: 8 GB GPU: GeForce GT 740M

Software: Ubuntu 16 Tensorflow GPU Version: 1.2.1

I am trying to follow the walk-through tutorial however, during data generation phase I receive the following error: AttributeError: 'NoneType' object has no attribute 'vocab_size'

Command:

Generate data

t2t-datagen \ --data_dir=$DATA_DIR \ --tmp_dir=$TMP_DIR \ --num_shards=100 \ --problem=$PROBLEM

Result: [sudo] password for agemagician: INFO:tensorflow:Generating training data for wmt_ende_tokens_32k. INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-nc-v11.tgz INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.en INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.de INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-commoncrawl.tgz INFO:tensorflow:Reading file: commoncrawl.de-en.en INFO:tensorflow:Reading file: commoncrawl.de-en.de INFO:tensorflow:Reading file: commoncrawl.fr-en.en INFO:tensorflow:Reading file: commoncrawl.fr-en.fr INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-europarl-v7.tgz INFO:tensorflow:Reading file: training/europarl-v7.de-en.en INFO:tensorflow:Reading file: training/europarl-v7.de-en.de INFO:tensorflow:Reading file: training/europarl-v7.fr-en.en INFO:tensorflow:Reading file: training/europarl-v7.fr-en.fr INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-giga-fren.tar INFO:tensorflow:Reading file: giga-fren.release2.fixed.en.gz INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.en.gz already exists, skipping unpacking INFO:tensorflow:Reading file: giga-fren.release2.fixed.fr.gz INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.fr.gz already exists, skipping unpacking INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-un.tgz INFO:tensorflow:Reading file: un/undoc.2000.fr-en.en INFO:tensorflow:Reading file: un/undoc.2000.fr-en.fr INFO:tensorflow:Alphabet contains 244 characters INFO:tensorflow:Trying min_count 500 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 2805 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 1411 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 1518 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 1499 [591, 290, 339, 48, 233, 739, 896, 10, 113, 5, 754, 1312, 318, 1264, 730, 1258, 1317, 573, 151, 31, 4] ['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 250 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 5016 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 2303 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 2444 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 2404 [331, 1220, 416, 42, 112, 2137, 1901, 2187, 67, 5, 432, 2217, 676, 2169, 421, 2163, 2222, 867, 169, 30, 4] ['This', 'sen', 'ten', 'ce', 'was', 'enco', 'ded', '', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 125 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 9156 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 3767 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 3984 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 3942 [171, 3290, 589, 88, 3414, 1473, 59, 5, 222, 3755, 1423, 350, 2045, 862, 400, 28, 3] ['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Su', 'b', 'word', 'Te', 'xt', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 62 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 16110 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 6194 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 6495 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 6429 [145, 1751, 1086, 61, 84, 976, 6062, 14, 54, 4, 487, 6242, 1204, 6194, 509, 3778, 705, 519, 28, 3] ['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'xt', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 31 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 26981 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 9956 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 10305 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 10256 [130, 2607, 1851, 74, 4634, 1945, 52, 4, 3617, 2494, 10021, 5345, 10074, 1034, 7129, 39, 3] ['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Tex', 't', 'En', 'cod', 'er', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 15 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 44225 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 15912 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 16370 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 16302 [118, 9925, 552, 70, 6612, 2242, 48, 4, 3955, 3409, 16067, 10338, 1832, 6639, 40, 3] ['This', 'sente', 'nce', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Text', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 7 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 71748 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 25276 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 25830 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 25747 [112, 12304, 25530, 65, 14571, 3782, 45, 4, 5370, 18085, 15003, 25565, 3039, 17012, 53, 3] ['This', 'sentence', '', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 3 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 118656 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 40416 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 41107 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 41021 [107, 15099, 61, 24687, 23, 41, 4, 14183, 17470, 26262, 11280, 19526, 94, 3] ['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Text', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 5 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 87819 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 30533 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 31215 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 31106 [110, 20492, 63, 15371, 4407, 44, 4, 6307, 14237, 12073, 30924, 30266, 22, 3] ['This', 'sentence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 4 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 101029 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 34630 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 35402 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 35268 [109, 17333, 62, 30348, 22, 44, 4, 19392, 12658, 10909, 35086, 25188, 20, 3] ['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. Traceback (most recent call last): File "/usr/local/bin/t2t-datagen", line 378, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/usr/local/bin/t2t-datagen", line 361, in main training_gen(), FLAGS.problem + UNSHUFFLED_SUFFIX + "-train", File "/usr/local/bin/t2t-datagen", line 151, in lambda: wmt.ende_wordpiece_token_generator(FLAGS.tmp_dir, True, 2**15), File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/wmt.py", line 230, in ende_wordpiece_token_generator tmp_dir, "tokens.vocab.%d" % vocab_size, vocab_size) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 265, in get_or_generate_vocab vocab_size, tokenizer.token_counts, 1, 1e3) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 329, in build_to_target_size return bisect(min_val, max_val) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 324, in bisect if (abs(other_subtokenizer.vocab_size - target_size) < AttributeError: 'NoneType' object has no attribute 'vocab_size'

Any idea how can I fix it ?

ZhenYangIACAS commented 7 years ago

I have this issue too.

agemagician commented 7 years ago

The same issue again if I used the "wsj_parsing_tokens_32k".

Result: INFO:tensorflow:Generating training data for wsj_parsing_tokens_32k. INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-nc-v11.tgz INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.en INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.de INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-commoncrawl.tgz INFO:tensorflow:Reading file: commoncrawl.de-en.en INFO:tensorflow:Reading file: commoncrawl.de-en.de INFO:tensorflow:Reading file: commoncrawl.fr-en.en INFO:tensorflow:Reading file: commoncrawl.fr-en.fr INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-europarl-v7.tgz INFO:tensorflow:Reading file: training/europarl-v7.de-en.en INFO:tensorflow:Reading file: training/europarl-v7.de-en.de INFO:tensorflow:Reading file: training/europarl-v7.fr-en.en INFO:tensorflow:Reading file: training/europarl-v7.fr-en.fr INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-giga-fren.tar INFO:tensorflow:Reading file: giga-fren.release2.fixed.en.gz INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.en.gz already exists, skipping unpacking INFO:tensorflow:Reading file: giga-fren.release2.fixed.fr.gz INFO:tensorflow:Subdirectory /home/agemagician/tmp/t2t_datagen/giga-fren.release2.fixed.fr.gz already exists, skipping unpacking INFO:tensorflow:Not downloading, file already found: /home/agemagician/tmp/t2t_datagen/training-parallel-un.tgz INFO:tensorflow:Reading file: un/undoc.2000.fr-en.en INFO:tensorflow:Reading file: un/undoc.2000.fr-en.fr INFO:tensorflow:Alphabet contains 244 characters INFO:tensorflow:Trying min_count 500 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 2805 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 1411 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 1518 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 1499 [591, 290, 339, 48, 233, 739, 896, 10, 113, 5, 754, 1304, 318, 1465, 730, 1325, 1428, 573, 151, 31, 4] ['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 250 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 5016 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 2303 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 2444 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 2404 [331, 1220, 416, 42, 112, 2137, 1901, 2174, 67, 5, 432, 2209, 676, 2370, 421, 2230, 2333, 867, 169, 30, 4] ['This', 'sen', 'ten', 'ce', 'was', 'enco', 'ded', '', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'x', 't', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 125 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 9156 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 3767 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 3984 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 3942 [171, 3290, 589, 88, 3414, 1473, 59, 5, 222, 3747, 1423, 350, 2045, 862, 400, 28, 3] ['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Su', 'b', 'word', 'Te', 'xt', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 62 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 16110 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 6194 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 6495 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 6429 [145, 1751, 1086, 61, 84, 976, 6062, 14, 54, 4, 487, 6234, 1204, 6395, 509, 3778, 705, 519, 28, 3] ['This', 'sen', 'ten', 'ce', 'was', 'enc', 'ode', 'd', 'by', 'the', 'Su', 'b', 'wor', 'd', 'Te', 'xt', 'En', 'co', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 31 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 26981 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 9956 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 10305 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 10256 [130, 2607, 1851, 74, 4634, 1945, 52, 4, 3617, 2494, 10222, 5345, 10185, 1034, 7129, 39, 3] ['This', 'sent', 'ence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Tex', 't', 'En', 'cod', 'er', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 15 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 44225 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 15912 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 16370 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 16302 [118, 9925, 552, 70, 6612, 2242, 48, 4, 3955, 3409, 16268, 10338, 1832, 6639, 40, 3] ['This', 'sente', 'nce', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'wor', 'd', 'Text', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 7 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 71748 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 25276 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 25830 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 25747 [112, 12304, 25517, 65, 14571, 3782, 45, 4, 5370, 18085, 15003, 25676, 3039, 17012, 53, 3] ['This', 'sentence', '', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 3 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 118656 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 40416 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 41107 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 41021 [107, 15099, 61, 24687, 23, 41, 4, 14183, 17470, 26262, 11280, 19526, 94, 3] ['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Text', 'En', 'code', 'r', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 5 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 87819 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 30533 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 31215 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 31106 [110, 20492, 63, 15371, 4407, 44, 4, 6307, 14237, 12073, 31035, 30266, 22, 3] ['This', 'sentence', 'was', 'enco', 'ded', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. INFO:tensorflow:Trying min_count 4 INFO:tensorflow:Iteration 0 INFO:tensorflow:vocab_size = 101029 INFO:tensorflow:Iteration 1 INFO:tensorflow:vocab_size = 34630 INFO:tensorflow:Iteration 2 INFO:tensorflow:vocab_size = 35402 INFO:tensorflow:Iteration 3 INFO:tensorflow:vocabsize = 35268 [109, 17333, 62, 30348, 22, 44, 4, 19392, 12658, 10909, 35197, 25188, 20, 3] ['This', 'sentence', 'was', 'encode', 'd', 'by', 'the', 'Sub', 'word', 'Tex', 't', 'Enco', 'der', '._'] This sentence was encoded by the SubwordTextEncoder. Traceback (most recent call last): File "/usr/local/bin/t2t-datagen", line 378, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/usr/local/bin/t2t-datagen", line 361, in main training_gen(), FLAGS.problem + UNSHUFFLED_SUFFIX + "-train", File "/usr/local/bin/t2t-datagen", line 122, in 215, 29), File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/wsj_parsing.py", line 102, in parsing_token_generator source_vocab_size) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 265, in get_or_generate_vocab vocab_size, tokenizer.token_counts, 1, 1e3) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 329, in build_to_target_size return bisect(min_val, max_val) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 323, in bisect other_subtokenizer = bisect(min_val, present_count - 1) File "/usr/local/lib/python3.5/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 324, in bisect if (abs(other_subtokenizer.vocab_size - target_size) < AttributeError: 'NoneType' object has no attribute 'vocab_size'

wenkesj commented 7 years ago

@agemagician @ZhenYangIACAS As a temporary workaround, follow these steps. I would suggesting waiting for a contributor to answer, but this works for me:

  1. Clone tensor2tensor

    git clone https://github.com/tensorflow/tensor2tensor.git
    cd tensor2tensor
  2. Change lines 333-341

    
    if subtokenizer.vocab_size > target_size:
    other_subtokenizer = bisect(present_count + 1, max_val)
    else:
    other_subtokenizer = bisect(min_val, present_count - 1)
  1. Reinstall pip module
    sudo pip uninstall tensor2tensor
    sudo pip install .
lukaszkaiser commented 7 years ago

This is hopefully corrected in 1.0.11 (as above) please give it a try. I'm closing for now, please reopen if you still see the issue.