lanwuwei / SPM_toolkit

Neural network toolkit for sentence pair modeling.
302 stars 70 forks source link

If i want to use the sentence pair model to get the similarity between them? #27

Closed BruceLee66 closed 5 years ago

BruceLee66 commented 5 years ago

Now i have 1000000 sentence pairs,which throw out the same meaning.when i use those data to train the sentence model,i saved the model static pkl. But i use the trained model to eval new sentence pair,almost all of them get the score(1.0) . what should i do?can you give me some advice!

lanwuwei commented 5 years ago

All positive training examples? no negative?

BruceLee66 commented 5 years ago

yes. all sentences pairs are similar. when I use this trained model to predict other sentence pair which is different from each other.its score still be very closely to 1.I really confused.

lanwuwei commented 5 years ago

You need negative samples for training, otherwise the model will biased towards positive case.

BruceLee66 commented 5 years ago

I decide to select negative examples randomly. The number of negative samples is 5 times that of the positive example.Would that be OK?

lanwuwei commented 5 years ago

1:1 should be enough. Importantly, you need to make sure the negative examples are meaningful: a pair shared many n-gram words but non-paraphrase.

BruceLee66 commented 5 years ago

okay!I will try the ratio of 1:1,thank you very much.