Closed piskvorky closed 4 years ago
Let me clarify, If the user already download a model, internet connection used for
lists.json
too, we retrieve the part of a path to the local file from it). If we have no connection, we can "guess" only (but we have very "regular" structure, in this case, this should work fine).I agree about the check, this should be optional (but True
by default, anyway, we must be sure that the data is correct, but the user should be able to disable this check at one's own risk).
I also encountered this problem. I was going in a trip where I would have only no/very weak internet connection. I preloaded all the models before the trip hoping to still work on my project. I was caught by a big surprise when I realised I couldn't work without internet!! My easter holidays are over when they didn't even started... I have to find what to de without my laptop :)
I agree that consistency is important, but possible solution would be: 1) try if there is an internet connection, 2) if 1 fails, try to load from default location with some default model name 3) if 2 fails throw exception that the model cannot be found. I am very new to this package, but I guess the default location shouldn't change for many users?
It would be also great to have some custom exceptions telling what went wrong. Otherwise it is not really obvious why it fails. If you need help I could look into the source code and try to fix it when I am back.
@DSamuylov I agree, we definitely need to add a special flag for this case, feel free to contribute (need to add "persistence" flag to https://github.com/RaRe-Technologies/gensim/blob/10a3dab8d00c0523ff871af75fb0badcff14848b/gensim/downloader.py#L357)
I agree with @DSamuylov . I didn't realize gensim-data
depends on an internet connection, that's bad design. The way I see it, we need two things:
Fix the design so that internet is not mandatory for already-downloaded models.
Better, clear progress/error messages, so users know what's going on. The errors we saw during the workshop were really terrible. Nobody knew what's going on.
As seen during our workshop yesterday, various network issues can appear during live or even offline events.
Once a user had downloaded a dataset onto their machine (
~/gensim-data
), they shouldn't require any internet access to use it. If the API needs to do some "online checking", this checking should be optional.