wangcunxiang / RFiD

The repository for the paper <RFiD: Towards Rational Fusion-in-Decoder for Open-Domain Question Answering>
6 stars 0 forks source link

Question about hyper-parameters #1

Open Hr0803 opened 11 months ago

Hr0803 commented 11 months ago

Hello, I've been trying to run this model with the provided code. I used the same parameters in the sample Train script with multiple GPUs as below.

export NGPU=2;
python -m torch.distributed.launch --nproc_per_node=$NGPU train_reader.py \
        --train_data train.json \
        --eval_data dev.json \
        --model_size base \
        --per_gpu_train_batch_size 1 \
        --per_gpu_eval_batch_size 1 \
        --accumulation_steps 64 \
        --total_steps 320000 \
        --eval_freq 20000 \
        --save_freq 20000 \
        --n_context 100 \
        --add_loss binary \
        --cat_emb 

but having 49.1 EM score in NQ dev set and 50.6 in test set. I wonder if this score is acceptable considering the margin of error.

Plus, it would be thankful if you clarify whether the hyper-parameters applied on different model sizes(base/large) are the same or different in any parts(batch size, total steps, optimizer, learning rate, scheduler).

wangcunxiang commented 11 months ago

I have not implemented the multi-gpu setting in this project since I only have two A100s (one for running, one for dev). I suppose it is the reason. So, I recommend you to use the single GPU. Welcome to reach me if you have more question.