kyzhouhzau / BERT-NER

Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).
MIT License
1.23k stars 335 forks source link

test.txt and label_test.txt isn't same in line numbers #57

Open sEhsanTaher opened 5 years ago

sEhsanTaher commented 5 years ago

hi I use the code (thanks for that!) but there is a problem when test prediction writes in the "output/result_dir/label_test.txt" I thought that this file must be the same as "data/test.txt" but it isn't!

I know that this (Bert-ner) library removes empty new lines in "output/result_dir/label_test.txt" but with removing empty new lines in "data/test.txt" the problem still exists. (number of lines in "output/result_dir/label_test.txt" is less than "data/test.txt" )

here links of those files: "data/test.txt" : https://github.com/kyzhouhzau/BERT-NER/blob/master/data/test.txt

"output/result_dir/label_test.txt" : https://github.com/kyzhouhzau/BERT-NER/blob/master/output/result_dir/label_test.txt

thanks

kyzhouhzau commented 5 years ago

After wordpieces tokenize, sentence length not always shorter than 128 in the test set, i think

ILiangk commented 5 years ago

I meet the same problem the output/result_dir/label_test.txt sometimes more than data/test.txt sometimes less than it . so do you solute your problem?

pinpom commented 5 years ago

@kyzhouhzau Thanks for the code. I'm also confuse about your "output/result_dir/label_test.txt" & "data/test.txt" - those 2 files are completely different. From my understanding, the "output/result_dir/label_test.txt" is generated only after the model finished training? meaning before I run "bash run_ner.sh", there is no "label_test.txt" in the "output/result_dir" folder? If my understanding is not correct, would very appreciate your clarification.