yzhangcs / crfpar

[ACL'20, IJCAI'20] Code for "Efficient Second-Order TreeCRF for Neural Dependency Parsing" and "Fast and Accurate Neural CRF Constituency Parsing".
https://www.aclweb.org/anthology/2020.acl-main.302
MIT License
76 stars 7 forks source link

Reproduce results with pretrained model #3

Closed tungngthanh closed 3 years ago

tungngthanh commented 4 years ago

Hi, Your work is exciting. I can reproduce non-pretrained parsing results quite fast and efficiently. However, I can not reproduce the constituency parsing results with Bert. For the bert constituency parsing, I run python run.py train --device 0 --feat bert --file exp/ptb.bert And get the result:

max score of dev is 94.18% at epoch 183 the score of test at epoch 183 is 94.09%

For pretrained, we should get something around 95.59. Can you guide me to reproduce the results?

yzhangcs commented 4 years ago

You can execute the following command:

python -u run.py train  \
  -p  \
  -d 0  \
  --feat bert -f exp/ptb.bert.crf  \
  --mbr  \
  --fembed data/glove.6B.100d.txt  \ 
  --unk unk
tungngthanh commented 4 years ago

Thanks for your reply. I just check the file "config.ini", the default values are

bert_model ='bert-base-cased' n_bert_layers=4

So for this setting, what is the result I would expect? Experiments in the paper are with "bert-large-cased", right?

yzhangcs commented 4 years ago

Yeah, I use bert-large-cased to be consistent with kitaev et al. 2019.

tungngthanh commented 4 years ago

Thank you for the fast reply. do you think we should allow gradient in bert_embedding? I am quite sure that kitaev et al. 2019 use the gradient. Your models only learn the scalar mix between 4 last layers. If you allow the gradient (kinda fine-tuning I think), the results can be better.

yzhangcs commented 4 years ago

Actually, using BERT with frozen parameters is enough. I have conducted some experiments on BERT fine tuning and does not obtain considerable gains.

tungngthanh commented 4 years ago

Sorry for disturbing you again. I run exactly the command you give me but cannot get the expected result.

max score of dev is 94.28% at epoch 193 the score of test at epoch 193 is 94.09%

The only thing I did may differ from you is that I use the dataset from kitaev et al. 2019 directly: train/dev/test: 02-21.10way.clean, 22.auto.clean, 23.auto.clean So do you think it can be a problem?

yzhangcs commented 4 years ago

Sorry for late reply and thanks for reporting this issue. I'm working to troubleshoot the problem but still haven't figured it out... Another implementation of crf constituency parser is integrated in this repo and the code behaves normally by following the training instructions. You can check it out.

tungngthanh commented 4 years ago

After checking the parser repo, I see that the main difference between the bert embedding in that repo and this repo is the drop out function. And I run the experiments with that repo and it works very well. Sorry for troubling you one more time. Can you share with me the CTB datasets and how to reproduce the experimental results?

yzhangcs commented 4 years ago

Yeah, thanks. I have fixed the bug, but I forgot about it :(. I can't share you the data which may raise the copyright issue. However, you can extract ctb files following this repo.

tungngthanh commented 4 years ago

Yeah, I followed the instruction there, I want to check with you about the stats for ct 5.1: in syntactic distance parser, they use split from Liu and Zhang, 2017 which has the number of sentences in train/dev/test sets are 17544/352/348. They are a bit different from yours: the training set has 18104 sentences. Can you guide me to reproduce your dataset? The produce data from CTB 8.0 They split it based on sentence ids as followings:

training = list(range(1, 270 + 1)) + list(range(440, 1151 + 1)) development = list(range(301, 325 + 1)) test = list(range(271, 300 + 1))

Do you happen to know the id splitting to generate your data? I am so sorry for troubling you from time to time.

yzhangcs commented 4 years ago

I remember that some articles were missing in that dump. The article ids are discontinuous..