Closed saramoeini20 closed 1 year ago
Hi,
The format of the data in our FCGEC is different from M2 format. We utilize operation-oriented paradigm to annotate the dataset. More details and examples of our data can be found in Appendix B of our paper (https://aclanthology.org/2022.findings-emnlp.137.pdf) and data
folder.
Besides, M2 format is only utilized to compute the performances of the model ( with precision, recall and F0.5 metrics), which we borrow from MuCGEC (ChERRANT) . You can find more details in scorer
folder and Section 4.1 in our paper.
If you want to create the data as the format in our FCGEC, you can use our convert_seq2seq_to_operation.py script. The descriptions of the algorithm can be found in README of the scripts folder and Algorithm 1 in our paper. It is convenient to convert normal seq2seq data to our operation format.
And if you have any more questions, feel free to add the comments here!
Is convert_seq2seq_to_operation.py script just for Chinese? If i want it in another language i should modify it or it can't be used?
And for computing the performances of the model, Just M2 format is usable in GEC tasks?
Is convert_seq2seq_to_operation.py script just for Chinese? If i want it in another language i should modify it or it can't be used?
Yes, the convert_seq2seq_to_operation.py script can only be utilized to convert Chinese data, but you can modify it to other language (e.g., in English, you can regard each word as a character to match).
And for computing the performances of the model, Just M2 format is usable in GEC tasks?
Yes, for precision, recall and F0.5 metrics in GEC task, the predictions and ground truths are processed to parallel form and then be converted to M2 format to compute the metrics in ChERRANT.
Thank you so much for your complete and timely response.
You're welcome :)
Hi, I'm kind of beginner at GEC and i had a question about structure of dataset because I wanted to create it myself for my work. I see the format of your data is in json and sometimes i see the M2 format or parallel file format. Are they different from each other and where should we use each one of them? if you will help me i would be thankful.