Closed dutkaD closed 6 years ago
@dutkaD you probably have an unknown label in your data! (i.e. your vocab file is missing one of the labels)
hi @guillaumegenthial: my evaluation is only on training_data. With your code, all words in vocab file but not in glove would be zero vector. So i think (i.e. your vocab file is missing one of the labels) cant happen. Do you have any ideas?
I had the same error. Turns out I was using '0' instead of 'O' in the tags.
I had the same error. Turns out I was using '0' instead of 'O' in the tags.
tks @apohl1111 . i found that too.
@lizzy2689 , i have the same error. i have checked the vocab's tag file, i have not mistyped anything. i dont understand what the problem is. could you help me fix this bug?
@rashibudati Earlier I thought that we need to segregate train testa and testb in 80%,10% and 10% respectively. While doing that I was getting the error that is mentioned above. When I copied and pasted everything in all the files it worked. This code doesn't perform better than sequence tagging repository. Metric always shows the same number irrespective of the size of dataset we take.
Hi @rashibudati , To share an update on this issue, I am using this solution to do something other than NER. I found that this issue comes because of line 45 of main.py. For padding tokens to match uneven length samples, @guillaumegenthial is using the label "O". This means padded tokens are being tagged as "Others". If your data does not have an "O" tag, then build_vocab.py script will not populate "O" in vocab.tags.txt. Hence during execution, it will find "O" tags that were used in padding as the foreign tags. This mismatch generates the error. The author's answer about missing tags is correct. To fix this, you need to manually add a capital "O" tag in a new line of your vocab.tags.txt after running build_vocab.py
Hi @rashibudati , To share an update on this issue, I am using this solution to do something other than NER. I found that this issue comes because of line 45 of main.py. For padding tokens to match uneven length samples, @guillaumegenthial is using the label "O". This means padded tokens are being tagged as "Others". If your data does not have an "O" tag, then build_vocab.py script will not populate "O" in vocab.tags.txt. Hence during execution, it will find "O" tags that were used in padding as the foreign tags. This mismatch generates the error. The author's answer about missing tags is correct. To fix this, you need to manually add a capital "O" tag in a new line of your vocab.tags.txt after running build_vocab.py
Solved my problem, thanks.
What can be the problem? Did anyone have something like this?
Saving checkpoints for 0 into results/model/model.ckpt.
Traceback (most recent call last): . . . .