stanfordnlp / stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
https://stanfordnlp.github.io/stanza/
Other
7.27k stars 891 forks source link

AssertionError with stanza.download("hr") #741

Closed kjaksic closed 3 years ago

kjaksic commented 3 years ago

Describe the bug

Unable to download the Croatian pipeline using stanza.download("hr"). AssertionError is thrown, it appears the "default_md5" within "https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.2.1.json" for "hr" differs from the md5 of "http://nlp.stanford.edu/software/stanza/1.2.1/hr/default.zip"

To Reproduce

import stanza stanza.download("hr")

Expected behavior

The pipeline binaries are downloaded.

Environment (please complete the following information)

AngledLuffa commented 3 years ago

That is not happening for me. Perhaps something is downloaded incorrectly? These are the stats on the file I downloaded when I tried download("hr"):

[john@localhost ~]$ md5sum stanza_resources/hr/default.zip
ddfe11092feb67fe4a30039b86828705  stanza_resources/hr/default.zip
[john@localhost ~]$ ls -l stanza_resources/hr/default.zip
-rw-rw-r--. 1 john john 206928215 Jul  4 11:57 stanza_resources/hr/default.zip
kjaksic commented 3 years ago

Download of the language file stops at some point (always after reaching 50%, but I cannot reproduce exact stopping point) throwing assertion error ( assert(not md5 or file_exists(path, md5)).

I also tried to download the English module, the same thing happened. However, if I download modules using Google Colab, everything works fine.

I did the fresh installation of stanza, but it didn't work.

ned, 4. srp 2021. u 20:40 John Bauer @.***> napisao je:

That is not happening for me. Perhaps something is downloaded incorrectly? These are the stats on the file I downloaded when I tried download("hr"):

@. ~]$ md5sum stanza_resources/hr/default.zip ddfe11092feb67fe4a30039b86828705 stanza_resources/hr/default.zip @. ~]$ ls -l stanza_resources/hr/default.zip -rw-rw-r--. 1 john john 206928215 Jul 4 11:57 stanza_resources/hr/default.zip

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/741#issuecomment-873639616, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTVVE2GUWAAULT753N7LOTTWCTJPANCNFSM47Y5FVBQ .

AngledLuffa commented 3 years ago

This seems more like a problem with the connection. There should be a URL printed out when it downloads, such as

http://nlp.stanford.edu/software/stanza/1.2.1/hr/default.zip

Can you download that URL directly with your browser or something, then put it in the appropriate directory? For example,

~/stanza_resources/hr

kjaksic commented 3 years ago

Thank you John, I was able to download the folder and put it manually in the directory. Now everything is working fine.

pon, 5. srp 2021. u 08:06 John Bauer @.***> napisao je:

This seems more like a problem with the connection. There should be a URL printed out when it downloads, such as

http://nlp.stanford.edu/software/stanza/1.2.1/hr/default.zip

Can you download that URL directly with your browser or something, then put it in the appropriate directory? For example,

~/stanza_resources/hr

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/741#issuecomment-873828343, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTVVEZQ4RICXBCOJ7ZOKY3TWFDVNANCNFSM47Y5FVBQ .