raspberryice / gen-arg

Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21'
MIT License
114 stars 29 forks source link

Only 10 F1 score on wikievent dataset #14

Open Changhy1996 opened 2 years ago

Changhy1996 commented 2 years ago

Hi, I tried to follow scripts/train_kairos.sh and scripts/test_kairos.sh but only received low performance as follow:

Role identification: P: 16.88, R: 4.456, F: 7.18 Role: P: 15.58, R: 4.21, F: 6.63 Coref Role identification: P: 19.48, R: 5.26, F: 8.29 Coref Role: P: 15.58, R: 4.21, F: 6.63

Even I tried to have more epochs , I can only get F1 score around 10. Is there anything goes wrong?

By the way, I failed to download the ckpt you shared on s3 due to a network error, is there any other way to acquire these files?

Thanks.

raspberryice commented 2 years ago

Hi Changhy, Sorry about this, I checked the code and it seems the problem is in the test_kairos.sh script. --mark_trigger is one essential argument. (Note that if you have a preprocessed_KAIROS directory, the model will directly read from that directory and this option doesn't matter anymore.)

raspberryice commented 2 years ago

btw, the fixed scripts are uploaded.

SapaePhyu commented 1 year ago

Hi, even tho I have preprocessed_KAIROS directory, I still get F1 score around 10. May I know how can I fix it? Here's my test_kairos.sh script.

!/usr/bin/env bash

set -e set -x CKPT_NAME=gen-KAIROS MODEL=constrained-gen

rm -rf checkpoints/${CKPT_NAME}-pred python train.py --model=$MODEL --ckpt_name=${CKPT_NAME}-pred \ --load_ckpt=checkpoints/${CKPT_NAME}/epoch=2.ckpt \ --dataset=KAIROS \ --eval_only \ --mark_trigger \ --train_file=data/wikievents/train.jsonl \ --val_file=data/wikievents/dev.jsonl \ --test_file=data/wikievents/test.jsonl \ --coref_dir=data/wikievents/coref \ --train_batch_size=4 \ --eval_batch_size=4 \ --learning_rate=3e-5 \ --accumulate_grad_batches=4 \ --num_train_epochs=3

python src/genie/scorer.py --gen-file=checkpoints/$CKPT_NAME-pred/predictions.jsonl \ --test-file=data/wikievents/test.jsonl \ --dataset=KAIROS \ --coref-file=data/wikievents/coref/test.jsonlines \ --coref

raspberryice commented 1 year ago

A quick comparison shows that you are missing the --head-only keyword in the scoring script.

Can you double check the checkpoints/$CKPT_NAME-pred/predictions.jsonl file to see if the output looks normal? (You can also post a few lines here for me to check.

SapaePhyu commented 1 year ago

Hi, thank you for your reply!

According to the results, with or without --head-only keyword does not affect F1 score that much.

And then, the output of checkpoints/$CKPT_NAME-pred/predictions.jsonl file looks normal.

Below are my results and first 10 lines of the predictions.jsonl file.

My results

Evaluation by matching head words only.... Role identification: P: 29.17, R: 4.99, F: 8.52 Role: P: 26.04, R: 4.46, F: 7.61 Coref Role identification: P: 31.25, R: 5.35, F: 9.13 Coref Role: P: 28.12, R: 4.81, F: 8.22

Without --head-only... Role identification: P: 27.08, R: 4.63, F: 7.91 Role: P: 25.00, R: 4.28, F: 7.31 Coref Role identification: P: 31.25, R: 5.35, F: 9.13 Coref Role: P: 28.12, R: 4.81, F: 8.22

Outputs of checkpoints/KAIROS-pred/predictions.jsonl

{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at place from medical issue, killed by killer", "gold": " members died at place from medical issue, killed by The Taliban killer"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosive device using to attack target at training center place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at place from medical issue, killed by killer", "gold": " people died at place from medical issue, killed by killer"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosive device using to attack target at place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosives explosive device using to attack campus target at place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers detonated or exploded explosive device using to attack target at place", "gold": " gunmen detonated or exploded explosive device using to attack soldiers target at campus place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at place from medical issue, killed by killer", "gold": " members died at complex place from medical issue, killed by killer"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers identified as at place", "gold": " he identified bodies as at place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosive device using to attack target at southeastern place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " was injured by using in body part with medical issue at place", "gold": " 10 was injured by using in body part with medical issue at place"}

raspberryice commented 1 year ago

The predictions should include the special token which is used for matching the filled arguments:


{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " <arg>  detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at training center place", "gold": " <arg>  detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at training center place"}
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " people died at  <arg>  place from  <arg>  medical issue, killed by  <arg>  killer", "gold": " people died at  <arg>  place from  <arg>  medical issue, killed by  <arg>  killer"}
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " attackers detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at campus place", "gold": " <arg>  detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at  <arg>  place"}```
SapaePhyu commented 1 year ago

My bad, don't know why special token disappeared after pasting it on github comment. The predictions look exactly like it supposed to be.

Screen Shot 2023-03-21 at 12 42 31
SapaePhyu commented 1 year ago

@Changhy1996 Hi, may I know did you solve this issue? If so, please kindly let me know how did you solve it~

raspberryice commented 1 year ago

I suspect something is wrong with the scorer.py function. What is the spacy version that you are using?

SapaePhyu commented 1 year ago

Hi! The spacy version that I am using is 3.5.1, and others' versions are as follow.

torch 1.11.0+cu113 spacy 3.5.1 transformers 4.26.1 pytorch-lightning 1.9.4 torch-struct 0.5

SapaePhyu commented 1 year ago

Hi, can we use en_core_web_trf instead of en_core_web_sm?

raspberryice commented 1 year ago

I've uploaded a copy of my prediction results to outputs/wikievents-pointer-pred/predictions.jsonl. Try running the scorer.py function locally and see if you get the results in Table 5 of the paper.

SapaePhyu commented 1 year ago

It works, thank you!