Closed xuanricheng closed 3 years ago
Hi @xuanricheng,
Reducing the number of GPUs can negatively affects the performance because we use in-batch negatives across GPUs, i.e., batch instances in other GPUs are treated as negative instances.
The effective batch size can be computed by num_gpus * train_batch_size
(train_batch_size is set to 16 by default). If you train the model with 2 GPUs using the same hyper-parameter settings as our paper, you need to set train_batch_size=64
.
Hi @xuanricheng,
Reducing the number of GPUs can negatively affects the performance because we use in-batch negatives across GPUs, i.e., batch instances in other GPUs are treated as negative instances.
The effective batch size can be computed by
num_gpus * train_batch_size
(train_batch_size is set to 16 by default). If you train the model with 2 GPUs using the same hyper-parameter settings as our paper, you need to settrain_batch_size=64
.
Due to the limitation of GPU memory size, we can set ’train_batch_size’ as 32 to training, But this method still does not seem to have good precision. Is there really no way to get the precision in the paper under the environment of 2 GPUs?
Unfortunately, as also mentioned in the DPR paper (Table 3), the batch size does affect the performance. We haven't tested though, you can reduce the GPU memory using techniques such as gradient checkpointing.
I close this issue as there is no activity.
The experimental results are far lower than the papers My environment is as follows: Ubuntu 18.04.5 LTS python 3.8.10 CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz GPU: 2 * TITAN RTX 24GB MEM: 125GB other environments follow requirements.txt. our evaluation step is :
python train_biencoder.py --gpus=2 --distributed_backend=ddp --train_file=/downloads/data/retriever/nq-train.json --eval_file=/downloads/data/retriever/nq-dev.json --gradient_clip_val=2.0 --max_epochs=40 --binary
After training, there will be two more folders(version_2 & version_3) in "./biencoder" folder. we found only version_3 have checkpoint folder, so is "./biencoder/version_4/checkpoints/last.ckpt"
CUDA_VISIBLE_DEVICES=0,1 python generate_embeddings.py --biencoder_file=./biencoder/version_3/checkpoints/last.ckpt --output_file=./biencoder/embedding/em_my --passage_db_file=./passage_db --batch_size=2048 --parallel
we only change the batch_size from 4096 to 2048, building embedding take more time than training!