kushalj001 / pytorch-question-answering

Important paper implementations for Question Answering using PyTorch
MIT License
274 stars 50 forks source link

Hyperparameters in DrQA - Performance not as described #10

Open gustavhartz opened 2 years ago

gustavhartz commented 2 years ago

Thanks for sharing your work. Tried to run the DrQA notebook, which has excellent descriptions by the way. Just tried to spin up an Azure ML instance Standard_NC6 (6 cores, 56 GB RAM, 380 GB disk) and GPU - 1 x NVIDIA Tesla K80, to see if I could replicate the results you list after 5 epochs, but get terrible performance. I suspect that for your training you might have used a different set of hyperparameters.

The notebook contains the following:

HIDDEN_DIM = 128
EMB_DIM = 300
NUM_LAYERS = 3
NUM_DIRECTIONS = 2
DROPOUT = 0.3

optimizer = torch.optim.Adamax(model.parameters())

I suspect that it might be different LR from the default learning rate of Adamax? Hope that you still remember something about the configuration :)