studio-ousia / bpr

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering
Other
168 stars 11 forks source link

Evaluation result #2

Closed xuanricheng closed 3 years ago

xuanricheng commented 3 years ago

The experimental results are far lower than the papers My environment is as follows: Ubuntu 18.04.5 LTS python 3.8.10 CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz GPU: 2 * TITAN RTX 24GB MEM: 125GB other environments follow requirements.txt. our evaluation step is :

  1. Building passage database
  2. Training BPR
  1. Building passage embeddings
  1. Evaluating BPR

image

ikuyamada commented 3 years ago

Hi @xuanricheng,

Reducing the number of GPUs can negatively affects the performance because we use in-batch negatives across GPUs, i.e., batch instances in other GPUs are treated as negative instances.

The effective batch size can be computed by num_gpus * train_batch_size (train_batch_size is set to 16 by default). If you train the model with 2 GPUs using the same hyper-parameter settings as our paper, you need to set train_batch_size=64.

xuanricheng commented 3 years ago

Hi @xuanricheng,

Reducing the number of GPUs can negatively affects the performance because we use in-batch negatives across GPUs, i.e., batch instances in other GPUs are treated as negative instances.

The effective batch size can be computed by num_gpus * train_batch_size (train_batch_size is set to 16 by default). If you train the model with 2 GPUs using the same hyper-parameter settings as our paper, you need to set train_batch_size=64.

Due to the limitation of GPU memory size, we can set ’train_batch_size’ as 32 to training, But this method still does not seem to have good precision. Is there really no way to get the precision in the paper under the environment of 2 GPUs?

ikuyamada commented 3 years ago

Unfortunately, as also mentioned in the DPR paper (Table 3), the batch size does affect the performance. We haven't tested though, you can reduce the GPU memory using techniques such as gradient checkpointing.

ikuyamada commented 3 years ago

I close this issue as there is no activity.