Open sEhsanTaher opened 5 years ago
After wordpieces tokenize, sentence length not always shorter than 128 in the test set, i think
I meet the same problem the output/result_dir/label_test.txt sometimes more than data/test.txt sometimes less than it . so do you solute your problem?
@kyzhouhzau Thanks for the code. I'm also confuse about your "output/result_dir/label_test.txt" & "data/test.txt" - those 2 files are completely different. From my understanding, the "output/result_dir/label_test.txt" is generated only after the model finished training? meaning before I run "bash run_ner.sh", there is no "label_test.txt" in the "output/result_dir" folder? If my understanding is not correct, would very appreciate your clarification.
hi I use the code (thanks for that!) but there is a problem when test prediction writes in the "output/result_dir/label_test.txt" I thought that this file must be the same as "data/test.txt" but it isn't!
I know that this (Bert-ner) library removes empty new lines in "output/result_dir/label_test.txt" but with removing empty new lines in "data/test.txt" the problem still exists. (number of lines in "output/result_dir/label_test.txt" is less than "data/test.txt" )
here links of those files: "data/test.txt" : https://github.com/kyzhouhzau/BERT-NER/blob/master/data/test.txt
"output/result_dir/label_test.txt" : https://github.com/kyzhouhzau/BERT-NER/blob/master/output/result_dir/label_test.txt
thanks