Hi, I am recently reimplementing the SSE model and I am confused how you pre-process the quora_duplicate_questions.tsv:
I wonder how you generate the /pytorch/DeepPairWiseWord/data/quora/a.tok and b.tok? What tokenized method do you use?
How do you split train/test/dev dataset from the quora_duplicate_questions.tsv? do you use same split as "Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of IJCAI." and "Neural Paraphrase Identification of Questions with Noisy Pretraining"?
I would appreciate it if you could answer my questions, Thank you.
Hi, I am recently reimplementing the SSE model and I am confused how you pre-process the quora_duplicate_questions.tsv: