likejazz / Siamese-LSTM

Siamese LSTM for evaluating semantic similarity between sentences of the Quora Question Pairs Dataset.
253 stars 70 forks source link

prediction is wrong #6

Open haziyevv opened 4 years ago

haziyevv commented 4 years ago

In the prediction make_w2v_embeddings creates a new vector for each sentence. So numbers are not consistent with the prediction. It should be same with the training data. For example when we math word "play" to 1 in testing and to 22 in training, then how could we make prediction correctly ?

sasidhar13 commented 4 years ago

did you have a dataset for this code

sasidhar13 commented 4 years ago

please make a repository

haziyevv commented 4 years ago

What do you mean? I say that when you train with one dataset you are representing it with numbers using word indices. When you test you should also use the same word indices, but here it is not the case.

haziyevv commented 4 years ago

https://www.kaggle.com/c/quora-question-pairs/data

I used this data

sasidhar13 commented 4 years ago

thank you

uRENu commented 3 years ago

I have the same problem, and in my opinion, to solve this problem I customize a mapping table to ensure that the indices of the training set and the test set are the same. What do you think of ?