Open anindyasdas opened 4 years ago
Also having the same Issue..Anyone can help please..?
The issue is mainly due to use of different tokenizer. Two different tokenizer are used , specific problems arise while handling with "-" or special characters. Use Spacy tokenizer instead of nltk or white space.
Thanks very much, that helped
Data file added for reproducing the error input_data (1).txt
Primary analysis suggests: The file has tokens like: " North-East", and "third-largest", stanford tokenizer for coreference splits across hyphen, while nltk does does not. So, as per , nltk the token length of corresponding sentence is 37, which does not match co-reference indices (with 41 tokens) ['North', '-','East',third','-','largest']