Could not find a mixture for the Universal Dependencies (UD) style in Thai language

segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

MIT License

624 stars 36 forks source link

I have been trying to use a wtpsplit in the Thai language by using the 'ud' style as :

# specify language code to be 'th' and style='ud' according to the paper
wtp.split(text, lang_code="th", style='ud')

However, there returned an error that:

ValueError: Could not find a mixture for the style 'ud'.

I also checked in the language_info.csv file and found that the UD style is also supported in the Thai language as UD_Thai-PUD

I have tried on another supported style such as OPUS100 and found that it is usable, except for the UD style that returned me an error. Did this is an error or did I understand something wrong?

Thank you

segment-any-text / wtpsplit

Could not find a mixture for the Universal Dependencies (UD) style in Thai language #107