Question about hyper-parameters

Hello, I've been trying to run this model with the provided code. I used the same parameters in the sample Train script with multiple GPUs as below.

export NGPU=2;
python -m torch.distributed.launch --nproc_per_node=$NGPU train_reader.py \
        --train_data train.json \
        --eval_data dev.json \
        --model_size base \
        --per_gpu_train_batch_size 1 \
        --per_gpu_eval_batch_size 1 \
        --accumulation_steps 64 \
        --total_steps 320000 \
        --eval_freq 20000 \
        --save_freq 20000 \
        --n_context 100 \
        --add_loss binary \
        --cat_emb

but having 49.1 EM score in NQ dev set and 50.6 in test set. I wonder if this score is acceptable considering the margin of error.

Plus, it would be thankful if you clarify whether the hyper-parameters applied on different model sizes(base/large) are the same or different in any parts(batch size, total steps, optimizer, learning rate, scheduler).

wangcunxiang / RFiD

Question about hyper-parameters #1