Open haziyevv opened 4 years ago
did you have a dataset for this code
please make a repository
What do you mean? I say that when you train with one dataset you are representing it with numbers using word indices. When you test you should also use the same word indices, but here it is not the case.
https://www.kaggle.com/c/quora-question-pairs/data
I used this data
thank you
I have the same problem, and in my opinion, to solve this problem I customize a mapping table to ensure that the indices of the training set and the test set are the same. What do you think of ?
In the prediction make_w2v_embeddings creates a new vector for each sentence. So numbers are not consistent with the prediction. It should be same with the training data. For example when we math word "play" to 1 in testing and to 22 in training, then how could we make prediction correctly ?