Evaluation result - Githubissues

xuanricheng commented 3 years ago

The experimental results are far lower than the papers My environment is as follows： Ubuntu 18.04.5 LTS python 3.8.10 CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz GPU: 2 * TITAN RTX 24GB MEM: 125GB other environments follow requirements.txt. our evaluation step is :

Building passage database
Training BPR

python train_biencoder.py --gpus=2 --distributed_backend=ddp --train_file=/downloads/data/retriever/nq-train.json --eval_file=/downloads/data/retriever/nq-dev.json --gradient_clip_val=2.0 --max_epochs=40 --binary
After training, there will be two more folders(version_2 & version_3) in "./biencoder" folder. we found only version_3 have checkpoint folder, so is "./biencoder/version_4/checkpoints/last.ckpt"

Building passage embeddings

CUDA_VISIBLE_DEVICES=0,1 python generate_embeddings.py --biencoder_file=./biencoder/version_3/checkpoints/last.ckpt --output_file=./biencoder/embedding/em_my --passage_db_file=./passage_db --batch_size=2048 --parallel
we only change the batch_size from 4096 to 2048, building embedding take more time than training!

Evaluating BPR

The top-1 precision is 38.78 which is much lower than 41.1 on paper and 49 in Github

ikuyamada commented 3 years ago

Hi @xuanricheng,

Reducing the number of GPUs can negatively affects the performance because we use in-batch negatives across GPUs, i.e., batch instances in other GPUs are treated as negative instances.

The effective batch size can be computed by num_gpus * train_batch_size (train_batch_size is set to 16 by default). If you train the model with 2 GPUs using the same hyper-parameter settings as our paper, you need to set train_batch_size=64.

xuanricheng commented 3 years ago

Hi @xuanricheng,

Reducing the number of GPUs can negatively affects the performance because we use in-batch negatives across GPUs, i.e., batch instances in other GPUs are treated as negative instances.

The effective batch size can be computed by num_gpus * train_batch_size (train_batch_size is set to 16 by default). If you train the model with 2 GPUs using the same hyper-parameter settings as our paper, you need to set train_batch_size=64.

Due to the limitation of GPU memory size, we can set ’train_batch_size’ as 32 to training, But this method still does not seem to have good precision. Is there really no way to get the precision in the paper under the environment of 2 GPUs?

ikuyamada commented 3 years ago

Unfortunately, as also mentioned in the DPR paper (Table 3), the batch size does affect the performance. We haven't tested though, you can reduce the GPU memory using techniques such as gradient checkpointing.

ikuyamada commented 3 years ago

I close this issue as there is no activity.

studio-ousia / bpr

Evaluation result #2