stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Error, duplicate entry located #83

Open RahulKulhari opened 7 years ago

RahulKulhari commented 7 years ago

While running glove model it is printint below

BUILDING VOCABULARY
Processed 2271254108 tokens.
Counted 5682077 unique words.
Truncating vocabulary at min count 5.
Using vocabulary of size 1351693.

COUNTING COOCCURRENCES
window size: 15
context: symmetric
max product: 62985709
overflow length: 190141781
Reading vocab from file "vocab27may_aug.txt"...Error, duplicate entry located: 38
loaded 1351693 words.
Building lookup table...table contains 708830043 elements.
Processing token: 2052500000
Processed 2271254018 tokens.
Writing cooccurrences to disk...........11 files in total.
Merging cooccurrence files: processed 719306072 lines.

SHUFFLING COOCCURRENCES
array size: 1275068416
Shuffling by chunks: processed 719306072 lines.
Wrote 1 temporary file(s).
Merging temp files: processed 719306072 lines.

TRAINING MODEL
Read 719306072 lines.

One line it is showing is

Reading vocab from file "vocab27may_aug.txt"...Error, duplicate entry located: 38

Why it is showing this statement ?