jiesutd / LatticeLSTM

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.
1.79k stars 457 forks source link

Request the code for preprocessing OntoNotes 4 #9

Closed wangruicn closed 6 years ago

wangruicn commented 6 years ago

Hello, I am trying to reproduce your work on OntoNotes 4. Could you please provide some code or scripts for preprocessing that dataset? I mean, to split it into train/ dev/ test set, and to transform the original format in OntoNotes to CoNLL format (BMES).

I have downloaded OntoNotes 4 from LDC using my license, and tried to split that dataset according to the paper Named Entity Recognition with Bilingual Constraints, as mentioned in your ACL18 paper. However, some statistics are not consistent with the results shown in your paper. It will help a lot if you could provide the code for preprocessing. Thanks!

jiesutd commented 6 years ago

Hi @wangruicn , I have written a script to convert the tag to BMES(BIOES) format. You can find it here: https://github.com/jiesutd/NCRFpp/blob/master/utils/tagSchemeConverter.py

For the data of OntoNotes 4, please leave you email address, I will email you some necessary scripts or the data if I confirm that you have the LDC license.

wangruicn commented 6 years ago

ruicn@bit.edu.cn. Looking forward to your email @jiesutd

jiesutd commented 6 years ago

Sent!

zhanghaok commented 2 years ago

Hi Jiestutd ,I have encountered a problem in data preprocessing, can you share the preprocessing script? my email: 3250514239@qq.com. Looking forward to your email @jiesutd