medallia / Word2VecJava

Word2Vec Java Port
MIT License
186 stars 81 forks source link

fix incorrect vocabs when loading from utf-8 binary #38

Open thangntt2 opened 8 years ago

thangntt2 commented 8 years ago

Hi. I'm very grateful for your work in porting Word2vec to Java and it's a great job, I think. But I have a trouble when parse UTF-8 binary file (from Vietnamese corpus), the Vocabs was incorrect. So I fixed it by some ugly line of code :D. Review it and turn it to your clean code. P/S If you wanna get an UTF-8 binary example, email me at thangtq@seespace.co and sorry for my Eng :D I look forward to hearing from you.

woidda commented 8 years ago

should be also fixed in https://github.com/medallia/Word2VecJava/pull/34