Validation Accuracy different from paper

Hi, the validation accuracy I calculated for the fine-tuned models are different from the paper. Command:

python -m torch.distributed.launch --nproc_per_node 2  tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split \
--config $config \
--save_dir $folder \
--run_type val \
--resume_file $finetuned_model \
training_parameters.distributed True

I observed changing the batch size results in different values.		Val accuracy for batch size = 32	Val acc for batch size = 128	In paper
TextVQA TAP (base)	49.87	49.53	49.91
TextVQA TAP (additional data)	54.31	54.13	54.71

microsoft / TAP

Validation Accuracy different from paper #20