memray / OpenNMT-kpg-release

Keyphrase Generation
MIT License
217 stars 34 forks source link

about reproduction #17

Closed johncs999 closed 4 years ago

johncs999 commented 4 years ago

Hi, memray, thanks for your codes, it really helps ! I still have a few questions about the codes:

  1. How to reproduce the results for CatSeq in paper "One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases" ? I have tried to comment out these sentences and keep the others unchanged in 'config-rnn-keyphrase-one2seq-diverse.yml':
    
    ...
    #orth_reg: 'true'
    #lambda_orth_reg: 0.1
    #sem_cov: 'true'
    #lambda_sem_cov: 0.1

tgt_enc: 'rnn'

detach_tgt_enc: 'true'

num_negsample: 16

use_ending_state: 'true'

...

The F1@5 on SemEval is 0.281 while the one in the paper is 0.302. Is there anything difference besides the two loss above between CatSeq and CatSeqD ? Does CatSeq use the vanilla attention coverage mechanism by default?

2. How to reproduce the results for CatSeqD ? After I run your provided pretrained verbatim_append model, the F1@5 on SemEval is 0.268 while the one in the paper is 0.327. And I thought maybe it's due to the different parameters between the given model and the description in the paper. I tried to modify the 'config-rnn-keyphrase-one2seq-diverse.yml' file using the parameters described in Table 7 as follows:

lambda_orth_reg: 1 lambda_sem_cov: 0.03


But the resuls are even worse:

<img src='https://i.loli.net/2020/08/06/AhtoP8aTRH2JxeW.png' alt='AhtoP8aTRH2JxeW'/>

Can you give me some suggestions on reproducing CatSeq and CatSeqD ?
memray commented 4 years ago

Sorry for the late reply. Can you take a look at the beam_size used during inference? We used 50 in the paper, but it might not be set in config/test/config-test-keyphrase-one2seq.yml.

johncs999 commented 4 years ago

Thinks for your reply. I tried to use 50 but the results are similar when using --eval_topbeam. The logs at last step for CatSeqD are:

Step 100000/100000; acc:  55.31; ppl:  7.76; xent: 2.05; lr: 0.00002; 13308/1128 tok/s;  48167 sec

The whole CatSeqD training config file is:

model_type: keyphrase
tgt_type: verbatim_append

data: data/keyphrase/meng17/kp20k
save_checkpoint_steps: 10000
keep_checkpoint: 20
seed: 3435
train_steps: 100000
valid_steps: 200000 # no validation
report_every: 100

encoder_type: brnn
rnn_type: GRU
word_vec_size: 100
rnn_size: 150
layers: 1

optim: adam
learning_rate: 1e-3
max_grad_norm: 2

batch_size: 32
valid_batch_size: 128
dropout: 0.1

global_attention: mlp

tensorboard: 'true'
log_file_level: DEBUG

copy_attn: 'true'
reuse_copy_attn: 'true'
coverage_attn: 'true'

context_gate: 'both'
input_feed: 1
share_embeddings: 'true'
bridge: 'true'

orth_reg: 'true'
lambda_orth_reg: 1
sem_cov: 'true'
lambda_sem_cov: 0.03

tgt_enc: 'rnn'
detach_tgt_enc: 'true'
num_negsample: 16
use_ending_state: 'true'

exp: kp20k-one2seq-birnn-GRU150-EMB100-ATTNmlp-Dropout00-OR1-SC03
save_model: models/keyphrase/meng17-one2seq/kp20k.one2seq.birnn.Dropout00-OR1-SC03
log_file: output/keyphrase/meng17-one2seq/kp20k.one2seq.birnn.Dropout00-OR1-SC03.log
tensorboard_log_dir: runs/keyphrase/meng17-one2seq/kp20k.one2seq.birnn.Dropout00-OR1-SC03/

world_size: 1
gpu_ranks:
- 0
# 1
master_port: 5000

And the translating command is:

python kp_gen_eval.py -tasks pred eval report -config config/test/config-test-keyphrase-one2seq.yml \
-data_dir data/keyphrase/meng17/ -ckpt_dir models/keyphrase/meng17-one2seq/ \
-output_dir output/meng17-one2seq-topbeam-selfterminating/kp20k-one2seq-birnn-GRU150-EMB100-ATTNmlp-Dropout00-OR1-SC03/ \
-testsets semeval -gpu 0 --verbose --beam_size 50 --batch_size 32 \
--max_length 40 --onepass --beam_terminate topbeam --eval_topbeam
memray commented 4 years ago

You raised a very good point. If --eval_topbeam is enabled, it only evaluates with the top-scored sequence in beam search, which is not super useful in the KP setting. Besides, please --beam_terminate full rather than topbeam for reproducing the results. topbeam can be much faster with certain performance degradation. Sorry about the confusion.

python kp_gen_eval.py -tasks pred eval report -config config/test/config-test-keyphrase-one2seq.yml \ -data_dir data/keyphrase/meng17/ -ckpt_dir models/keyphrase/meng17-one2seq/ \ -output_dir output/meng17-one2seq-topbeam-selfterminating/kp20k-one2seq-birnn-GRU150-EMB100-ATTNmlp-Dropout00-OR1-SC03/ \ -testsets semeval -gpu 0 --verbose --beam_size 50 --batch_size 32 \ --max_length 40 --onepass --beam_terminate full

johncs999 commented 4 years ago

Thanks a lot !