stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

How to train a model with case information #105

Closed sriram-c closed 4 years ago

sriram-c commented 6 years ago

Hi,

I want to train the corpus with case information (not converting to lowercase). I can't find any options in the build/glove for this. I can see there is a pretrained model available at this link with cased information.

Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): glove.840B.300d.zi

Please help me in training a cased model.

Thanks, sriram

akanshajainn commented 6 years ago

Prepare your input corpus where each token is separated by a space, it works regardless of case type of tokens.