Open Demon-JieHao opened 5 years ago
Hi, I ran a grid search over hyperparameters. Number of layers {2, 3, 4}, dropout {0.2, 0.3, 0.5}, learning rate {0.00001, 0.0001, 0.001}, # hidden units {138, 256, 512} and #heads {2, 4}. The details are in our paper http://aclweb.org/anthology/D18-1503 under hyperparameters. Unfortunately I don't remember the best configuration on top of my head.
Hi, my result of Fan model on logical infer task is quite low as well under grid search over hyper-parameters. What are the epochs, batch_size and param_init in your setting? Thank you very much.
One of my results are here.
Hi there, the hyperparameters are provided in our paper. Parameters are initialized uniformly in (0, 0,1). The batch size is set to 64 (I think). All the models are trained for maximum 50 epochs.
The results of FAN models are usually unstable and very sensitive to hyperparameters unless some heuristic learning rate decay method is used.
Also, which PyTorch version are you using? There might be a bug if you use PyTorch 1.0.
Hello, thanks! I am using PyTorch 0.4.1. There's a bug in the "add_timing_signal" function. log_inv_inc should be converted to float.
By the way, are you using Skorch to conduct the grid search in Pytorch? I don't know how to do grid search in a right way in PyTorch.
Hi, I am trying to reproduce the Fan model result on logical inference. But the results seems far away from the original results in the paper. What are the hyperparameters of the Fan model in this task? Thanks!