stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Allocating 8 extra bytes per word? #53

Closed jfrattarola closed 7 years ago

jfrattarola commented 7 years ago

In glove.c, you increment vector_size at line 67 to account for the bias value. Additionally, at line 70 you allocate space for (vector_size + 1). Is this extra 8-byte allocation redundant?

If so, it would increase the amount of memory allocated for word vector, context word vector and gradsq vectors by an unnecessary ~60MB for a corpus of 2M unique words.

ghost commented 7 years ago

It may not be, but we've actually had a lot of issues with segmentation faults in the code base on subtle off-by-one issues like this that we've slowly fixed, so I don't think it's a great idea to change this - we're more in maintenance mode than development mode on this repo :). It may be implictly expected elsewhere that more space has been malloc'd.