ketranm / fan_vs_rnn

The Importance of Being Recurrent for Modeling Hierarchical Structure
25 stars 2 forks source link

Hyperparameters of Fan model on Logical inference task #1

Open Demon-JieHao opened 5 years ago

Demon-JieHao commented 5 years ago

Hi, I am trying to reproduce the Fan model result on logical inference. But the results seems far away from the original results in the paper. What are the hyperparameters of the Fan model in this task? Thanks!

ketranm commented 5 years ago

Hi, I ran a grid search over hyperparameters. Number of layers {2, 3, 4}, dropout {0.2, 0.3, 0.5}, learning rate {0.00001, 0.0001, 0.001}, # hidden units {138, 256, 512} and #heads {2, 4}. The details are in our paper http://aclweb.org/anthology/D18-1503 under hyperparameters. Unfortunately I don't remember the best configuration on top of my head.

hsing-wang commented 5 years ago

Hi, my result of Fan model on logical infer task is quite low as well under grid search over hyper-parameters. What are the epochs, batch_size and param_init in your setting? Thank you very much.

hsing-wang commented 5 years ago

One of my results are here. 143144444

ketranm commented 5 years ago

Hi there, the hyperparameters are provided in our paper. Parameters are initialized uniformly in (0, 0,1). The batch size is set to 64 (I think). All the models are trained for maximum 50 epochs.

The results of FAN models are usually unstable and very sensitive to hyperparameters unless some heuristic learning rate decay method is used.

Also, which PyTorch version are you using? There might be a bug if you use PyTorch 1.0.

hsing-wang commented 5 years ago

Hello, thanks! I am using PyTorch 0.4.1. There's a bug in the "add_timing_signal" function. log_inv_inc should be converted to float. default

By the way, are you using Skorch to conduct the grid search in Pytorch? I don't know how to do grid search in a right way in PyTorch.