princeton-nlp / PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
MIT License
790 stars 122 forks source link

About Chinese datasets #1

Closed BearRedy closed 3 years ago

BearRedy commented 3 years ago

Can this code deal with chinese text please?

a3616001 commented 3 years ago

We trained and evaluated our models on English datasets, and our released models can only handle English.

However, you can definitely train your own models on the other datasets or in other languages using our code. To do so, besides converting the dataset into the format described in the repo, you need to add labels of the dataset in shared/const.py and add the dataset into the argument --task in run_entity.py and run_relation.py.

If your data is in Chinese, you may want to use a pre-trained language model that supports Chinese (e.g., bert-base-multilingual-uncased).

Hope this helps!

BearRedy commented 3 years ago

We trained and evaluated our models on English datasets, and our released models can only handle English.

However, you can definitely train your own models on the other datasets or in other languages using our code. To do so, besides converting the dataset into the format described in the repo, you need to add labels of the dataset in shared/const.py and add the dataset into the argument --task in run_entity.py and run_relation.py.

If your data is in Chinese, you may want to use a pre-trained language model that supports Chinese (e.g., bert-base-multilingual-uncased).

Hope this helps!

thanks a lot!

YaoXinZhi commented 3 years ago

Great job. So, if I change these codes, will I be able to use your code on any data set of the same task ?

a3616001 commented 3 years ago

@YaoXinZhi Yes, the code should work on other datasets of the entity-relation extraction task.

nlper01 commented 9 months ago

Can this code deal with chinese text please?

大佬,你好,我在跑对比实验,能否分享一份可以在中文数据集上跑通的代码以及几条数据示例(好用chatgpt快速完成数据格式转换),我的邮箱是2674053421@qq.com,感激不尽!