stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.89k stars 1.52k forks source link

Issues unzipping the Common Crawl Pretrained Vectors #208

Open ahfretheim opened 1 year ago

ahfretheim commented 1 year ago

Hey Stanford,

    I'm able to download the Common Crawl Pretrained Vectors fine, but when I try to unzip the file, I get an "unspecified

error" that stops the extraction. I've tried repeating the download and unzipping again, but I'm still having the same issue. I also

adjusted my power settings to not have the computer go to sleep or anything and it doesn't seem to help. Is there any other

way you could send the vectors over? I'm developing in Python, so any file that will open properly in Python is fine.

     Sincerely,

                 Alexander Fretheim
AngledLuffa commented 1 year ago

There's two versions, each with two mirrors. I was able to download each of them and extract them. If nothing's working for you, would you be more specific about what the problem is? You could also report the file size and/or md5sum of the archives (which probably we should put on the page itself)

On Tue, Dec 27, 2022 at 7:16 PM ahfretheim @.***> wrote:

Hey Stanford,

I'm able to download the Common Crawl Pretrained Vectors fine, but when I try to unzip the file, I get an "unspecified error" that stops the extraction. I've tried repeating the download and unzipping again, but I'm still having the same issue. I also adjusted my power settings to not have the computer go to sleep or anything and it doesn't seem to help. Is there any other way you could send the vectors over? I'm developing in Python, so any file that will open properly in Python is fine.

 Sincerely,

             Alexander Fretheim

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/GloVe/issues/208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWLYQIUFFEJ6LA2IVM3WPOWI3ANCNFSM6AAAAAATK4V4JM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ahfretheim commented 1 year ago

Hey John, thanks for the quick response! I was able to successfully extract the file to the hard disk of this computer. I think maybe it was just some kind of issue with the stick drive I was trying to extract it to because it's so big. Thanks for the help anyways! For the purpose of specificity, the file I am trying to use is the 840 billion token/300-dimensional one.

     Sincerely,

                   Alexander

On Tue, Dec 27, 2022 at 8:44 PM John Bauer @.***> wrote:

There's two versions, each with two mirrors. I was able to download each of them and extract them. If nothing's working for you, would you be more specific about what the problem is? You could also report the file size and/or md5sum of the archives (which probably we should put on the page itself)

On Tue, Dec 27, 2022 at 7:16 PM ahfretheim @.***> wrote:

Hey Stanford,

I'm able to download the Common Crawl Pretrained Vectors fine, but when I try to unzip the file, I get an "unspecified error" that stops the extraction. I've tried repeating the download and unzipping again, but I'm still having the same issue. I also adjusted my power settings to not have the computer go to sleep or anything and it doesn't seem to help. Is there any other way you could send the vectors over? I'm developing in Python, so any file that will open properly in Python is fine.

Sincerely,

Alexander Fretheim

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/GloVe/issues/208, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA2AYWLYQIUFFEJ6LA2IVM3WPOWI3ANCNFSM6AAAAAATK4V4JM

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/GloVe/issues/208#issuecomment-1366364088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASFHHIDHAQJMU4JDFH4KJCLWPPARJANCNFSM6AAAAAATK4V4JM . You are receiving this because you authored the thread.Message ID: @.***>