Textsum: Assertion error while using vocab file generated from dataset

clw5180 commented 6 years ago

For the textsum model, I download the toy dataset given by author. I don't use his vocab but generate a vocab from the toy dataset by myself. The vocab looks like this:

UNK 1
the 314
. 229
, 223
</s> 208
<s> 208
to 146
of 129
in 119
a 86
and 83
said 50
for 49
... 49
`` 48
'' 46
</p> 42
<p> 42
's 41
on 41
## 38
that 37
as 31
by 31
was 30
from 29
with 28
</d>' 24
an 23
-rrb- 22

however, when I ran the textsum/seq2seq_attention.py, I came across this error:

assert vocab.CheckVocab(data.PAD_TOKEN) > 0
TypeError: unorderable types: NoneType() > int()

I try the vocab offered by the author, there isn't any problem like this. However, that vocab is generated from a big dataset, not this toy dataset. Can anyone tell me what this error mean, or how to solve the problem? Thanks a lot!!

tensorflowbutler commented 6 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code OS Platform and Distribution TensorFlow installed from TensorFlow version Bazel version CUDA/cuDNN version GPU model and memory Exact command to reproduce

clw5180 commented 6 years ago

The problem has been solved. There's something wrong with the label such as </d>and so on.

tensorflow / models

Textsum: Assertion error while using vocab file generated from dataset #5811