piskvorky / gensim-data

Data repository for pretrained NLP models and NLP corpora.
https://rare-technologies.com/new-api-for-pretrained-nlp-models-and-datasets-in-gensim/
GNU Lesser General Public License v2.1
965 stars 128 forks source link

what is the right way to resume model loading? #38

Closed hyzhak closed 4 years ago

hyzhak commented 4 years ago

After some time of loading I've got:

ConnectionResetError: [Errno 104] Connection reset by peer

what is the right way to resume model downloading? For the moment I just loop infinity and try to load again and again, but it doesn't look very efficient.

import gensim.downloader as api

loading = True
while loading:
    try:
        wv_from_bin = api.load("word2vec-google-news-300")
        loading = False
    except Exception as e:
        print('failed:', e)
        print('try again')
piskvorky commented 4 years ago

The download ultimately calls urllib.urlretrieve here: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/downloader.py#L374

Not sure how to avoid your Connection reset though. Maybe if the target server supports range queries, we could download each file in multiple parts (multiple HTTPS requests) and re-assemble at the end… sounds complicated though. CC @mpenkov

mpenkov commented 4 years ago

Currently, there's no way to resume a download from the point that it failed. You can only try again from the very beginning.

In theory, yes, you could probably download the file more intelligently, but it isn't something that's been a problem for many gensim users, so we don't have a "right way" to handle this scenario. The sample you provided seems like it will work, eventually.

If your connection is super-flaky, then try downloading the URL with another tool that supports resuming downloads, and then put the downloaded file wherever gensim expects it to go. Unfortunately, this is suggestion is hand-wavy and completely undocumented.

hyzhak commented 4 years ago

@piskvorky and @mpenkov thank you for the response, I've spun the code above and I got the model in few tries.