rooshenas / ebr_mt

5 stars 1 forks source link

On the validity of EBR-Reranking #5

Closed Hannibal046 closed 1 year ago

Hannibal046 commented 1 year ago
Hi, after conducting experiments on iwslt14-deen dataset, I got following results. It seems that reranking doesn't help, could you please explain this ? Iwslt14Deen Reported MyExperiment
BaseNMT + Beam(5) 33.87 36.36
Marginal-EBR 35.68
Conditional-EBR 37.58
BaseNMT + Beam(100) 37.64

The command I use is:

## training
CUDA_VISIBLE_DEVICES=0 fairseq-train \
    data-bin/iwslt14.tokenized.de-en \
    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 4096 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok moses \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --no-last-checkpoints \
    --max-epoch 64

## Beam 100 inference
fairseq-generate 'data-bin/iwslt14.tokenized.de-en' \
    --gen-subset test --path 'checkpoints/checkpoint_best.pt' \
    --beam 100 --batch-size 16 --remove-bpe @@ --scoring sacrebleu 

//Generate test with beam=100: BLEU = 37.65 67.3/45.2/32.7/24.0 (BP = 0.958 ratio = 0.959 hyp_len = 154042 ref_len = 160636)
rooshenas commented 1 year ago

First, the numbers you reported here are inconsistent with prior work on this dataset! Second, EBR learns to re-rank samples from a trained AR-NMT, so to measure its boost you have to retrain it based on the samples from your model.