Hi. I'm very grateful for your work in porting Word2vec to Java and it's a great job, I think. But I have a trouble when parse UTF-8 binary file (from Vietnamese corpus), the Vocabs was incorrect. So I fixed it by some ugly line of code :D. Review it and turn it to your clean code.
P/S If you wanna get an UTF-8 binary example, email me at thangtq@seespace.co and sorry for my Eng :D
I look forward to hearing from you.
Hi. I'm very grateful for your work in porting Word2vec to Java and it's a great job, I think. But I have a trouble when parse UTF-8 binary file (from Vietnamese corpus), the Vocabs was incorrect. So I fixed it by some ugly line of code :D. Review it and turn it to your clean code. P/S If you wanna get an UTF-8 binary example, email me at thangtq@seespace.co and sorry for my Eng :D I look forward to hearing from you.