Reproduce Condenser Result on MSMARCO passage ranking

texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

http://tevatron.ai

Apache License 2.0

518 stars 100 forks source link

Reproduce Condenser Result on MSMARCO passage ranking #21

Closed Albert-Ma closed 2 years ago

Albert-Ma commented 2 years ago

Hi, wonderful work on this toolkit! I really like it!

Following the README here, I use the following command to train the retriever with Condenser on 2 GPUS which results in the total batch size of 64, the same setting as in the paper:

python -m tevatron.driver.train \
  --output_dir ./output_model \
  --model_name_or_path Luyu/condenser \
  --save_steps 20000 \
  --fp16 \
  --train_dir ../marco/bert/train \
  --per_device_train_batch_size 32 \
  --learning_rate 5e-6 \
  --num_train_epochs 3 \
  --dataloader_num_workers 2

The result I got is 0.331:

##################### MRR @ 10: 0.3308558466366481 QueriesRanked: 6980 #####################

Is there any parameter I missed to set? Thanks!

luyug commented 2 years ago

A few things,

Technically we don't officially support DP with Pytorch but only DDP. You may run into undefined behaviors.
With DDP, you should have the --negatives_x_device set.
What you are currently having is effectively a batch size of 2 (gpu) x 32 (qry) x 8 (psg) = 512

Moreover, the performance is roughly the same as what we have in the Condenser paper, with BM25 negatives. You'd need a round of hard negative mining to get better performance. The best performance also would require you to replace Condenser initializer with a coCondenser initializer.

Albert-Ma commented 2 years ago

A few things,

Technically we don't officially support DP with Pytorch but only DDP. You may run into undefined behaviors.

With DDP, you should have the --negatives_x_device set.

What you are currently having is effectively a batch size of 2 (gpu) x 32 (qry) x 8 (psg) = 512

Moreover, the performance is roughly the same as what we have in the Condenser paper, with BM25 negatives. You'd need a round of hard negative mining to get better performance. The best performance also would require you to replace Condenser initializer with a coCondenser initializer.

Thanks for your reply! I used DDP to run the command and forgot to set the --negatives_x_device.