huggingface / hmtl

🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP
MIT License
1.19k stars 145 forks source link

Chinese is not supported? #5

Closed MrRace closed 5 years ago

MrRace commented 5 years ago

I try Chinese sentense,but it seems have not support Chinese yet? My test sentence:“中国的首都是北京”

VictorSanh commented 5 years ago

Hello,

Thanks for your interest!

Yes, the demo (along with the released weights) are for English. So Chinese is not supported right now. There is no plan in the short term to release a Chinese version but I would be glad to feature a chinese version if you are willing to train it.

Also, I realize that the README might not be clear enough on this, so I just added some clarification.

Victor

MrRace commented 5 years ago

Yeah,I would like to train a Chinese version.If you have some additional advice on the details, you can tell me in advance.

VictorSanh commented 5 years ago

You should replace the dataset in the data folder for all the tasks you want to train. The default config file consider 4 tasks (NER, EMD, Coreference and Relation Extraction). For more details on how to setup the data, I invite you to read this thread: https://github.com/huggingface/hmtl/issues/2

Make sure that you change the paths in the configuration file too.

Last, make sure that you modify the pre-trained word embeddings. I have not played with other language than latin-based ones, so I is not clear for me how to train character level embeddings for Chinesse. Thus I would simply remove the character level embedding in the config file (token_characters).