Closed pavaris-pm closed 1 year ago
This is a bit of a subtle problem: We have a test set in UD for Thai, but no train set. That's why we evaluate on UD_Thai-PUD
in the paper but we don't train on it (so there is no mixture). You can verify this by checking if the Tables in the Appendix (in this case Table 13) have a number for WtP_PUNCT
in the paper.
I have been trying to use a wtpsplit in the Thai language by using the 'ud' style as :
However, there returned an error that:
I also checked in the
language_info.csv
file and found that the UD style is also supported in the Thai language asUD_Thai-PUD
I have tried on another supported style such as
OPUS100
and found that it is usable, except for the UD style that returned me an error. Did this is an error or did I understand something wrong?Thank you