nlp-uoregon / trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Apache License 2.0
724 stars 99 forks source link

Does trankit support code-mixed languages (Two languages in one sentence) #64

Closed hithesh-sankararaman closed 10 months ago

hithesh-sankararaman commented 1 year ago

For example, I have a sentence "ah madam எனக்கு G T six fifty test drive பண்ணமுடியுமா madam." The above text comprises of tamil and english. Will trankit perform correct pos-tagging when used in "auto" mode. ?

minhhdvn commented 10 months ago

Hi @hithesh-sankararaman , Thanks for your question. Trankit models are trained for individual languages and not trained in code-mixed settings. You can still use Trankit to process code-mixed text, however the result might not be correct.