pitrack / incremental-coref

Code for "Moving on from OntoNotes: Coreference Resolution Model Transfer" and "Incremental Neural Coreference Resolution in Constant Memory"
Apache License 2.0
17 stars 4 forks source link

How to convert SemEval data to jsonlines? #6

Closed sm354 closed 2 years ago

sm354 commented 2 years ago

The README.md has detailed steps on how to convert OntoNotes to jsonlines. Could you please provide the steps for SemEval as well?

pitrack commented 2 years ago

Hi, sorry for the delayed response. I can update the instructions (along with uploading any additional scripts, if there were any) tomorrow (Friday).

(I'll also try to respond to #7 tomorrow, too.)

pitrack commented 2 years ago

I was missing a couple files. I've included them in this branch, along with some additional instructions in domain/README.md. It turns out the semeval file format is very similar to OntoNotes (maybe a couple column indices were a bit different?) so the files are preprocessing files are largely the same.

However, I wouldn't be surprised if there are several minutes of hacking needed with the minimize scripts to properly run on all the data files. Still, this should be a starting point and once I confirm this works for semeval, I'll merge the PR into main.

sm354 commented 2 years ago

Thanks for adding the required steps and files for SemEval. I didn't face any issues in converting to jsonlines for ca, it, es, and nl languages.