xlxwalex / FCGEC

The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型

https://aclanthology.org/2022.findings-emnlp.137

Apache License 2.0

104 stars 12 forks source link

Experiment Results' Reproduce using provided Checkpoint #3

Closed GMago-LeWay closed 1 year ago

GMago-LeWay commented 1 year ago

Hello! I downloaded the trained checkpoint in README for inferring on the test set to reproduce the results. The results given in README are (EM / F0.5 : 34.10 / 45.48). But my results (utilizing the run_stg_joint.sh) are (EM / F0.5 : 50.5 / 37.9). This difference cannot be neglected. Actually, I adjusted some code while inferring.

In line 16 of FCGEC/model/STG-correction/Model/tagger_model.py, I have to change the self.max_token = args.max_generate + 1 to self.max_token = args.max_generate. Otherwise, the parameter shape of self._hidden2t in the checkpoint cannot match the constructed model.
In line 46 of FCGEC-main/model/STG-correction/preprocess_data.py. Some additional code needs to be added because the "uid" for every sentence is essential in the test process. Thus, an additional column of the key is added in test.csv and I copy it to stg_joint_test.xlsx. I used this excel to form the final submission. My results are in row GMago on the Codalab page of results.

xlxwalex commented 1 year ago

Hi, thank for your feedback! The responses to your question are as follows:

There is a mistake here, we actually set max_generate (released checkpoint) to 5 during training phase. However, we calculated the distribution of the data after the rebuttal and thought that 6 would be more appropriate. Thus we recommend to re-run with max_generate =6. Thank you for pointing out the problem, I will update it in the README afterwards.
This result looks a bit weird, I will download your submission on Colab and check it to find the problem. It may take some time, I will reply here after i check it.

xlxwalex commented 1 year ago

Hello, I have identified the reason for performance difference in our codalab system. We are very sorry for the error of our scoring program. More details are shown below:

For correction metric calculation, we only compute the metric on erroneous sentences. Therefore, we need to filter out the correct sentences first (based on the error_flag attribute in golden label file). While developing the scoring program, I mistakenly employ the error_flag of the prediction file instead of the golden label file. Thus, resulting in an error for two metrics (corr_ex and corr_f0.5).

We have fixed the bug and you can submit the previous predict.zip file for re-testing, the results will be:

Meanwhile, i have updated the py and bash file to add the uid into the output file.

Thank you for your feedback!!! If you cannot reproduce our performances in codalab, feel free to add the comments here.

GMago-LeWay commented 1 year ago

I made a submission and now the results of the given checkpoint are consistent with README (EM / F0.5 : 34.10 / 45.48). Thanks for your reply!

kingfan1998 commented 1 year ago

Hello！ I downloaded checkpoint and pretrained_model, and modified it to my path, but I still get an error: "joint_evaluate.py: error: argument --lm_path: expected one argument" how to solve it. Thanks!

xlxwalex commented 1 year ago

Hello！ I downloaded checkpoint and pretrained_model, and modified it to my path, but I still get an error: "joint_evaluate.py: error: argument --lm_path: expected one argument" how to solve it. Thanks!

Hi, it seems you have used multiple values for lm_path, which stands for the path to the pre-trained language model. Can you share the complete bash script or the command?

kingfan1998 commented 1 year ago

!/bin/bash

Copyright 2022 The ZJU MMF Authors (Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Jiayu Fu and Ming Cai *).

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Train and Test for STG-Joint

Global Variable (!!! SHOULD ADAPT TO YOUR CONFIGURATION !!!)

CUDA_ID=1 SEED=2022 EPOCH=50 BATCH_SIZE=32 MAX_GENERATE=5 # MAX T SPECIAL_MAPPING=false # More details can be found in ISSUE 10 CHECKPOINT_DIR=checkpoints

Roberta-base-chinese can be downloaded at https://github.com/ymcui/Chinese-BERT-wwm

PLM_PATH=/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese/ # pretrained-model path

PLM_PATH= /pretrained_models/chinese-roberta-wwm-ext/ OUTPUT_PATH=stg_joint_test.xlsx

JOINT_CHECK_DIR=1021_jointmodel_stg

STEP 1 - PREPROCESS DATASET

DATA_BASE_DIR=dataset

DATA_OUT_DIR=stg_joint

DATA_TRAIN_FILE=FCGEC_train.json

DATA_VALID_FILE=FCGEC_valid.json

DATA_TEST_FILE=FCGEC_test.json

python preprocess_data.py --mode normal --err_only True \

--data_dir ${DATA_BASE_DIR} --out_dir ${DATA_OUT_DIR} \

--train_file ${DATA_TRAIN_FILE} --valid_file ${DATA_VALID_FILE} --test_file ${DATA_TEST_FILE}

STEP 2 - TRAIN STG-Joint MODEL

python joint_stg.py --mode train \

--gpu_id ${CUDA_ID} \

--seed ${SEED} \

--checkpoints ${CHECKPOINT_DIR} \

--checkp ${JOINT_CHECK_DIR} \

--data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR} \

--lm_path ${PLM_PATH} \

--batch_size ${BATCH_SIZE} \

--epoch ${EPOCH} \

--max_generate ${MAX_GENERATE}

STEP 3 - TRAIN STG-Joint MODEL

python joint_evaluate.py --mode test --gpu_id ${CUDA_ID} --seed ${SEED} \ --checkpoints ${CHECKPOINT_DIR} --checkp ${JOINT_CHECK_DIR} \ --export ${OUTPUT_PATH} \ --data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR} \ --max_generate ${MAX_GENERATE} \ --lm_path ${PLM_PATH} \ --batch_size ${BATCH_SIZE} \ --sp_map ${SPECIAL_MAPPING}

run: sh run_stg_joint.sh

xlxwalex commented 1 year ago

!/bin/bash

Copyright 2022 The ZJU MMF Authors (Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Jiayu Fu and Ming Cai *).

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Train and Test for STG-Joint

Global Variable (!!! SHOULD ADAPT TO YOUR CONFIGURATION !!!)

CUDA_ID=1 SEED=2022 EPOCH=50 BATCH_SIZE=32 MAX_GENERATE=5 # MAX T SPECIAL_MAPPING=false # More details can be found in ISSUE 10 CHECKPOINT_DIR=checkpoints

Roberta-base-chinese can be downloaded at https://github.com/ymcui/Chinese-BERT-wwm

PLM_PATH=/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese/ # pretrained-model path PLM_PATH= /pretrained_models/chinese-roberta-wwm-ext/ OUTPUT_PATH=stg_joint_test.xlsx

JOINT_CHECK_DIR=1021_jointmodel_stg

STEP 1 - PREPROCESS DATASET

DATA_BASE_DIR=dataset #DATA_OUT_DIR=stg_joint #DATA_TRAIN_FILE=FCGEC_train.json #DATA_VALID_FILE=FCGEC_valid.json #DATA_TEST_FILE=FCGEC_test.json

python preprocess_data.py --mode normal --err_only True #--data_dir ${DATA_BASE_DIR} --out_dir ${DATA_OUT_DIR} #--train_file ${DATA_TRAIN_FILE} --valid_file ${DATA_VALID_FILE} --test_file ${DATA_TEST_FILE}

STEP 2 - TRAIN STG-Joint MODEL

python joint_stg.py --mode train #--gpu_id ${CUDA_ID} #--seed ${SEED} #--checkpoints ${CHECKPOINT_DIR} #--checkp ${JOINT_CHECK_DIR} #--data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR} #--lm_path ${PLM_PATH} #--batch_size ${BATCH_SIZE} #--epoch ${EPOCH} #--max_generate ${MAX_GENERATE}

STEP 3 - TRAIN STG-Joint MODEL

python joint_evaluate.py --mode test --gpu_id ${CUDA_ID} --seed ${SEED} --checkpoints ${CHECKPOINT_DIR} --checkp ${JOINT_CHECK_DIR} --export ${OUTPUT_PATH} --data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR} --max_generate ${MAX_GENERATE} --lm_path ${PLM_PATH} --batch_size ${BATCH_SIZE} --sp_map ${SPECIAL_MAPPING}

run: sh run_stg_joint.sh

The error I can currently find is that you have commented out two parameters, DATA_BASE_DIR and DATA_OUT_DIR. This will cause joint_evaluate.py cannot run properly, but the configuration for lm_path seems to be correct. Have you considered taking out the line of joint_evaluate.py in the bash script and testing it in command line mode?

kingfan1998 commented 1 year ago

Sorry!

In evaluate_joint_config.py, I forgot to modify the parameters # Pretrained Model Params pretrained_args = ArgumentGroup(parser, 'pretrained', 'Pretrained Model Settings') pretrained_args.add_arg('use_lm', bool, True, 'Whether Model Use Language Models')

 ############################
 # pretrained_args.add_arg('lm_path', str, '/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese', 'Bert Pretrained Model Path')
 pretrained_args.add_arg('lm_path', str, './pretrained_models/chinese-roberta-wwm-ext/', 'Bert Pretrained Model Path')
############################

 pretrained_args.add_arg('lm_hidden_size', int, 768, 'HiddenSize of PLM')
 pretrained_args.add_arg('output_hidden_states', bool, True, 'Output PLM Hidden States')
 pretrained_args. add_arg('finetune', bool, True, 'Finetune Or Freeze')

But I encountered a new problem, Some weights of the model checkpoint at ./pretrained_models/chinese-roberta-wwm-ext/ were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform. LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform. dense.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

RuntimeError: Error(s) in loading state_dict for JointModel: size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([7, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([7]).

The pre-trained language model uses https://huggingface.co/hfl/chinese-roberta-wwm-ext/

kingfan1998 commented 1 year ago

Sorry!

In evaluate_joint_config.py, I forgot to modify the parameters # Pretrained Model Params pretrained_args = ArgumentGroup(parser, 'pretrained', 'Pretrained Model Settings') pretrained_args.add_arg('use_lm', bool, True, 'Whether Model Use Language Models')
 ############################
 # pretrained_args.add_arg('lm_path', str, '/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese', 'Bert Pretrained Model Path')
 pretrained_args.add_arg('lm_path', str, './pretrained_models/chinese-roberta-wwm-ext/', 'Bert Pretrained Model Path')
############################

 pretrained_args.add_arg('lm_hidden_size', int, 768, 'HiddenSize of PLM')
 pretrained_args.add_arg('output_hidden_states', bool, True, 'Output PLM Hidden States')
 pretrained_args. add_arg('finetune', bool, True, 'Finetune Or Freeze')
But I encountered a new problem, Some weights of the model checkpoint at ./pretrained_models/chinese-roberta-wwm-ext/ were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform. LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform. dense.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

RuntimeError: Error(s) in loading state_dict for JointModel: size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([7, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([7]).

The pre-trained language model uses https://huggingface.co/hfl/chinese-roberta-wwm-ext/

Thank you, I solved it. In the py file, set the Number of Tagger Classes to 5, and the Number of Max Token Generation to 5. Thank you very much

xlxwalex commented 1 year ago

Sorry! In evaluate_joint_config.py, I forgot to modify the parameters # Pretrained Model Params pretrained_args = ArgumentGroup(parser, 'pretrained', 'Pretrained Model Settings') pretrained_args.add_arg('use_lm', bool, True, 'Whether Model Use Language Models')
 ############################
 # pretrained_args.add_arg('lm_path', str, '/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese', 'Bert Pretrained Model Path')
 pretrained_args.add_arg('lm_path', str, './pretrained_models/chinese-roberta-wwm-ext/', 'Bert Pretrained Model Path')
############################

 pretrained_args.add_arg('lm_hidden_size', int, 768, 'HiddenSize of PLM')
 pretrained_args.add_arg('output_hidden_states', bool, True, 'Output PLM Hidden States')
 pretrained_args. add_arg('finetune', bool, True, 'Finetune Or Freeze')
But I encountered a new problem, Some weights of the model checkpoint at ./pretrained_models/chinese-roberta-wwm-ext/ were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform. LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform. dense.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

RuntimeError: Error(s) in loading state_dict for JointModel: size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([7, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([7]). The pre-trained language model uses https://huggingface.co/hfl/chinese-roberta-wwm-ext/
Thank you, I solved it. In the py file, set the Number of Tagger Classes to 5, and the Number of Max Token Generation to 5. Thank you very much

You're welcome.