stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Gigaword5 license #57

Closed rw closed 4 years ago

rw commented 7 years ago

According to https://catalog.ldc.upenn.edu/LDC2011T07, the Gigaword5 corpus is under a non-commercial license.

Does this 'pollute' the legal status of glove vectors derived from Gigaword5?

ghost commented 7 years ago

IANAL, and so I cannot give you official advice here, but my observation has been that the weights of machine learning systems trained on data are not subject to the same license as the data itself. A large amount of thought was put into this surrounding the ImageNet competition, which you can look up. If you do consult a lawyer on this question, please update us below!