Open jind11 opened 3 years ago
Thank you so much for your interests, and sorry for our late reply.
1. In the original "NLI" task, the order (is expected to) matters, because the task is directional: premise -> hypothesis. Also, the model is order sensitive because of the cross-attention nature.
For the few-shot DNN training, from a viewpoint of data augmentation, we wanted to make the best use of the limited number of training examples. Having both the (A -> B) and (B -> A) directions with two examples A&B, we can simulate both the cases: 1) A is the input, and 2) B is the input.
Regarding the number of the positive training examples, the described one is correct; "K(K-1)" accounts for the order.
2. As in the brief comment, the role of the "NLI validation examples" is to allow us to check if the model is at least fitting to the synthetic/artificial task. We observed that overfitting (i.e., achieving very high accuracy) on the training set is crucial in the few-shot setup, so we decided to just monitor the accuracy on the small subset of the training set. It might not be the best, but it is not realistic to assume an enough amount of separate validation examples.
We also tried to avoid the overlap between "nli_train_examples" and "nli_dev_examples", but I remember that this hurt the model's accuracy, because reducing even a small number of examples is significant in the few-shot setup.
Thanks, Kazuma
Hi, thank you for providing this source code. I have two questions after reading it: