Closed sarahwie closed 5 years ago
Hi Sarah, Thanks for pointing this out ! You are correct that we used test data to select our model. It was a deliberate choice on our part. Essentially, in this paper, we are not presenting new predictive models and as such we wanted to present our analysis on best possible version of our model for the test data (we believe this biases the model in favor of the attention rather than against it).
Second, we ran all our models using dev dataset too and we found that didn't actually made a difference in our results. I will push the code and results to this repository for the same by either tomorrow or Thursday.
Again, thanks for bringing this to our attention and we point to this choice in the main README for the project.
Hi Sarthak, Thanks for the response and clarification.
Hi, it seems that you've used the test data to select the best model to save during training. Am I misreading the code, or is there no evaluation set? Thanks.