klb3713 / sentence2vec

Tools for mapping a sentence with arbitrary length to vector space
664 stars 225 forks source link

vector value is not same to duplicated sentence. #6

Open stray-leone opened 9 years ago

stray-leone commented 9 years ago

I copied a sentence in sent.txt. so a sentence is duplicated. But, after executing demo.py vector value is not same to two same sentence.

my sent.txt file is below

Harbin Institute of Technology (HIT) was founded in 1920.                                                                                                                                      
Harbin Institute of Technology (HIT) was founded in 1920.
After nearly 100 years, HIT has developed into a large nationally renowned multi-disciplinary university with science, engineering and research as its core.
HIT is consistently on the forefront in making innovations in research. For years, HIT has continued to undertake large-scale and highly sophisticated national projects.
HIT students study humanities and social sciences along with basic engineering and science courses for a strong comprehensive base. 
HIT is famous for its original style of schooling: 'Being strict in qualifications for graduates; making every endeavor in educating students.'  
HIT has remained an international university since its foundation. Courses at HIT used to be conducted exclusively in Russian and Japanese.         
Today, all the faculty, students and staff of HIT, are dedicating, with full confidence   

the first vector value of 'Harbin Institute of Technology (HIT) was founded in 1920.' and the second vector value of 'Harbin Institute of Technology (HIT) was founded in 1920.' is different.

geovedi commented 9 years ago

I'm guessing Sentence2Vec preserves sentence order information. Not sure if it's a good thing.