Open ThinkingSlow opened 7 years ago
Hi, the mask is used to deal with the sentences with difference lengths in one minibatch. Actually, I did not try the experiments about mark/no-mask and ortho_weight/no ortho_weight. I will test it. Thanks for your questions.
I try some simple experiments. Baseline is 88.0% on test set; if no-mask, the accuracy is 87.7%; if no ortho_weight, the accuracy is 87.5%. I hope that answered your question.
Thank you soso much~~
Hello, I'm following your work, and try to reimplement ESIM by tensorflow. I noticed in you lstm_layer() , you masked c and h, I'm wondering how much will the mask improve the model compared with the no-mask(just basic LSTM). And how much will the ortho_weight help? Thank you so much.