stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Use GloVe as classifier and Support sub-word learning #78

Closed hfxunlp closed 7 years ago

hfxunlp commented 7 years ago

Hi, I'm trying to use GloVe as a classifier in a semi-supervised way, at the same time, these changes make the model could learn the sub-word level information. I'm not good with C, these code can work but it was ugly, It is wonderful if you could make it beautiful.

By default, following line is the current supervised classifier data format: __label__someclass there is a line of data of someclass and following line is the current sub-word information learning data format: __combine__ manysubwordunits ___cinfo_many ___cinfo_sub ___cinfo_word ___cinfo_units where the ___cinfo_ was not actually needed, just to distinguish the words and the sub-words.

Current code seems messy, it's better if these changes could become more neat and merge into cooccur.c.