nlp-uoregon / trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Apache License 2.0
724 stars 99 forks source link

MWT Training error #87

Open nagaraju291990 opened 2 months ago

nagaraju291990 commented 2 months ago

While training MWT, getting the following error.

UDError: The concatenation of tokens in gold file and in system file differ! First 20 differing characters in gold file: 'PROPNவெளியேறினார்.19' and system file: 'ஓவரில்ஏ_ஏPROPNவெளியே'

How to fix this.

This occurs with English UD data as well. Is there any format issue of train/dev files that I am missing. I am giving input in CONLLU format