stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

glove.twitter.27B.25d, item 38522 #159

Closed yzhou1122 closed 4 years ago

yzhou1122 commented 4 years ago

item 38522 : token "0.065581" ohly has 24 dimensions in 'glove.twitter.27B.25d.txt'

AngledLuffa commented 4 years ago

can you give us access to the training data you used?

On Sat, Dec 14, 2019, 1:00 PM Yueshen666 notifications@github.com wrote:

item 38522 : token "0.065581" ohly has 24 dimensions in 'glove.twitter.27B.25d.txt'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/GloVe/issues/159?email_source=notifications&email_token=AA2AYWJAOSA7FHWBZZV6D4TQYVCPBA5CNFSM4J24TQK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IAQWICA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNIP74XA2ABSTNZ6BLQYVCPBANCNFSM4J24TQKQ .

yzhou1122 commented 4 years ago

sorry. I mean the pre-trained vectors in glove.twitter.27B.25d.txt (from: http://nlp.stanford.edu/data/glove.twitter.27B.zip) in which the glove vector for token "0.065581" only has 24 dimensions ( should be 25).

pengowray commented 4 years ago

The entry name is the control character "NEL" (0x85; Next Line) followed by 25 numbers.

… 0.065581 0.39605 -0.96669 0.23706 -0.41379 -0.97006 0.16601 -1.292 -0.58989 0.11632 -1.365 -0.27939 -0.57222 -0.97108 -0.56319 -0.015263 -0.70465 -0.13867 1.0702 -0.25557 0.25122 -0.87755 0.70999 0.9118 -0.30077