xlxwalex / FCGEC

The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型
https://aclanthology.org/2022.findings-emnlp.137
Apache License 2.0
108 stars 12 forks source link

Incorrect Labeling in Dataset: "SC" replaced with "SD" in Two Instances of train dataset #22

Closed KingiLuther closed 1 year ago

KingiLuther commented 1 year ago

Issue Details: Incorrect Labeling: Upon careful inspection of the datasetFCGEC_train.json, I noticed that in two separate entries, the intended label "SC" has been erroneously written as "SD." This inconsistency may create confusion for users who rely on the dataset for training models or conducting research. I have thoroughly checked these entries, and I am confident that they should be labeled as "SC," not "SD."~~~

Affected Entries: The specific instances of this mislabeling can be found in the following data points:

"1544600e0a7c45bdcba6ce5525855ee2": {
        "sentence": "青年作家残雪认为鲁迅的作品实现了一种“突破”,而《故事新编》中的《铸剑》则将这种创造达到了登峰造极。",
        "error_flag": 1,
        "error_type": "CM;SD",
        "operation": "[{\"Insert\":[{\"pos\":48,\"tag\":\"INS_3\",\"label\":\"的境界\"}]},{\"Delete\":[42,43,44],\"Modify\":[{\"pos\":37,\"tag\":\"MOD_1\",\"label\":\"使\"}]}]",
        "version": "FCGEC EMNLP 2022"
    },
"aad7a950e2853b271eba684c84f81b55": {
        "sentence": "“非典”期间,在白衣天使们身上,都无不闪耀着舍身忘我、奋不顾身的光辉。",
        "error_flag": 1,
        "error_type": "CM;SD",
        "operation": "[{\"Delete\":[7,16],\"Insert\":[{\"pos\":12,\"tag\":\"INS_1\",\"label\":\"的\"}]},{\"Delete\":[7,17,18],\"Insert\":[{\"pos\":12,\"tag\":\"INS_1\",\"label\":\"的\"}]}]",
        "version": "FCGEC EMNLP 2022"
    }, 
xlxwalex commented 1 year ago

Hi,

Thank you very much for pointing out this issue. We apologize for the inconvenience caused. We have already commited the fixed dataset!