Hyperparameters in DrQA - Performance not as described

Thanks for sharing your work. Tried to run the DrQA notebook, which has excellent descriptions by the way. Just tried to spin up an Azure ML instance Standard_NC6 (6 cores, 56 GB RAM, 380 GB disk) and GPU - 1 x NVIDIA Tesla K80, to see if I could replicate the results you list after 5 epochs, but get terrible performance. I suspect that for your training you might have used a different set of hyperparameters.

The notebook contains the following:

HIDDEN_DIM = 128
EMB_DIM = 300
NUM_LAYERS = 3
NUM_DIRECTIONS = 2
DROPOUT = 0.3

optimizer = torch.optim.Adamax(model.parameters())

I suspect that it might be different LR from the default learning rate of Adamax? Hope that you still remember something about the configuration :)

kushalj001 / pytorch-question-answering

Hyperparameters in DrQA - Performance not as described #10