stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Is vocab ID in cooccurrence file the order of words in vocab.txt? #43

Open armintabari opened 8 years ago

armintabari commented 8 years ago

In co-occurrence file, each line is in the format of (word1_ID word2_ID count). What are these IDs? Are they the order of words in vocab count file (vocab.txt)?

ghost commented 8 years ago

Well we need to store words as integers rather than strings in some data structures. The particular mapping is not crucial, but having some mapping is important. I believe we sort a list of words and read off the word_id as the position of each word in the list.

IlterOnatKorkmaz commented 3 years ago

Is the list you mention the vocab.txt? How can we find/use the list?