ryankiros / skip-thoughts

Sent2Vec encoder and training code from the paper "Skip-Thought Vectors"
2.05k stars 544 forks source link

Can I train my own model using this implementation? #29

Open budhiraja opened 8 years ago

danielricks commented 7 years ago

Yeah you can. I've written a method that does just that in penseur_utils.py. Check out the code here: https://github.com/danielricks/penseur

pcg108 commented 7 years ago

If I had a corpus of documents, each document comprising of some number of sentences, should I put all of these sentences in the matrix X for training the encoder?

The instructions on the README indicate that the (i+1)-th entry is the sentence that follows the i-th sentence. But if we use multiple documents, then how do we indicate a certain set of contiguous sentences are for one document, and the next set are for another?

Or, should we train the encoder with one document at a time? Where each matrix corresponds to one document.