Open Hr0803 opened 11 months ago
I have not implemented the multi-gpu setting in this project since I only have two A100s (one for running, one for dev). I suppose it is the reason. So, I recommend you to use the single GPU. Welcome to reach me if you have more question.
Hello, I've been trying to run this model with the provided code. I used the same parameters in the sample Train script with multiple GPUs as below.
but having 49.1 EM score in NQ dev set and 50.6 in test set. I wonder if this score is acceptable considering the margin of error.
Plus, it would be thankful if you clarify whether the hyper-parameters applied on different model sizes(base/large) are the same or different in any parts(batch size, total steps, optimizer, learning rate, scheduler).