xlxwalex / FCGEC

The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型
https://aclanthology.org/2022.findings-emnlp.137
Apache License 2.0
104 stars 12 forks source link

approaches and evaluation #31

Closed AramTaravat closed 7 months ago

AramTaravat commented 8 months ago

Hello, I had some questions regarding GEC and your approach. First I think you used Seq2Edit Models and Seq2Seq Models for grammar correction part and modified some models like GECToR for Chinese. Then you proposed your model based on Switch-Tagger-Generator. Does STG is some how Seq2Edit and Seq2Seq combination? Also for grammar classification and detection Seq2Edit models aren't useful? And does this repository contains classification part? About evaluation I also have some questions. You used ChERRANT. Does the performance of it have any differences with m2score? Also ChERRANT wants m2 format but I don't see what is the need of m2 format actually :) . I mean the example like : S The cat sat at mat . A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0 A 4 4|||ArtOrDet|||the||a|||REQUIRED|||-NONE-|||0 can be written as : S The cat sat at mat . A 3 4|||on||| A 4 4|||the||a||| because if I'm not wrong the only important part are error indexes and the correction of it for evaluation part.

xlxwalex commented 8 months ago

Hi,

I apologize for the late response as I missed seeing your issue in my previous emails. Here are the point-by-point answers to your questions:

Does STG is some how Seq2Edit and Seq2Seq combination?

  1. I think our model is still fundamentally based on editing, which is the core of our work. The combination you're referring to, I presume, is related to the Generator part. However, we used Cloze completion instead of a Seq2Seq approach, so I think there's still a difference.

Also for grammar classification and detection Seq2Edit models aren't useful? And does this repository contains classification part?

  1. For the tasks of error classification and detection, we utilized a basic classification model composed of Encoder-Only models (BERT/RoBERTa etc.) with a fully connected layer, serving as a Baseline for scoring. The editing model is specifically aimed at the correction task. Since the implementation is very straightforward, we have not provided the code in our repository.

ChERRANT with M2 format

  1. ChERRANT is also based on m2score. The process involves: a) Using m2score to obtain edit operations between the original sentence and the ground truth. b) Applying m2score to get edit operations between the original sentence and the model output. c) As you mentioned, comparing edit operations to derive evaluation metrics. You can find more detailed information here: compareEdits

Therefore, it seems reasonable to borrow the M2 format in order to evaluate editing-based models and to obtain editing operations. Perhaps I haven't fully grasped the points. If my response is still confusing, please feel free to provide additional information

Please let me know if you need further clarification or additional details.

AramTaravat commented 7 months ago

Thanks for your reply and sorry for late reply. About point 3 I have still question :)

If ChERRANT is like m2score the inputs edits for compareEdits function are like below line right? (if we consider the example: The cat sat at mat .) A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0 Then why we can't use this line instead because just these parts are important: A 3 4|||on||| In other words why m2 files format are like : S <tokenized system output for sentence 1> A ||||||||<correction2||..||correctionN||||||||| instead of just : S <tokenized system output for sentence 1> A |||||<correction2||..||correctionN|||

AramTaravat commented 7 months ago

And about point 2 for classification in such tasks we need just Encoder-Only models with a fully connected layers?

xlxwalex commented 7 months ago

And about point 2 for classification in such tasks we need just Encoder-Only models with a fully connected layers?

Yes, in our work, for classification tasks (error detection and identification), we employ the most basic methods. This is because, with respect to the corpus, we only provide a baseline for reference.

Thanks for your reply and sorry for late reply. About point 3 I have still question :)

If ChERRANT is like m2score the inputs edits for compareEdits function are like below line right? (if we consider the example: The cat sat at mat .) A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0 Then why we can't use this line instead because just these parts are important: A 3 4|||on||| In other words why m2 files format are like : S <tokenized system output for sentence 1> A ||||||||<correction2||..||correctionN||||||||| instead of just : S <tokenized system output for sentence 1> A |||||<correction2||..||correctionN|||

The m2score was previously designed as an auxiliary tool for evaluation, as it can provide fine-grained assessment information and is broadly applicable to various correction models and evaluation needs. Therefore, during evaluation, the information generated by it can be utilized for customized assessments.

In ChERRANT, as you mentioned, only the edit information is actually used, so indeed, other parts are not necessary. Since m2score is a third-party tool, ChERRANT simply use it directly during evaluations.

If you have any more questions, feel free to reply :)

AramTaravat commented 7 months ago

Actually wanted to create my test dataset and use these tools for evaluation so wanted to see if just the edits are enough and other parts are not necessary. That based on your reply I think that is enough . Thank you for your help.

xlxwalex commented 7 months ago

Actually wanted to create my test dataset and use these tools for evaluation so wanted to see if just the edits are enough and other parts are not necessary. That based on your reply I think that is enough . Thank you for your help.

I'm glad to be of help.