Get sentence representation from customized dataset

ludens11 commented 4 years ago

Hello, currently im trying to reproduce your model with different language dataset for my learning purpose. My dataset only contain lot of sentences seperated with a new line. There is no target sentences in my dataset.

the question is, with my own dataset :

is it possible to train sent2vec like what you did in this project? because as far as i understand, you build the sent2vec model with source and target sentence. By that, it feel like my data doesnt meet the requirement.
in Sent2Vec model class, you loaded previous rnn weight. By that, i need to use your pretrained model. Is there a way to reproduce the weight with my own dataset?

Thanks in advance

wasiahmad commented 4 years ago

You closed the issue, did you find answers to your questions?

ludens11 commented 4 years ago

not yet, but im pretty sure that my data wont fit in this implementation. maybe?

wasiahmad commented 4 years ago

Yes, you are right. We train sentence encoder based on supervised learning and this requirement doesn't meet by your dataset. I think you should consider GenSen, USE, LASER.

wasiahmad / transferable_sent2vec

Get sentence representation from customized dataset #1