Currently a model can fail to load for a number of different reasons. However, the error raised seems to always be a general "failed to load" error. It would be useful if different errors could be raised for:
404 model file not found ( -> inform user to check the URL they provided)
Server isn't responding (HuggingFace is down again.. -> load from backup server)
Couldn't load model because there is no internet connection (-> suggest loading another model that is cached)
Failed to load from cache ( -> clear cache and retry downloading)
Couldn't load model because the file doesn't actually seem to be a valid .gguf file ( -> inform user to check the URL they provided)
Couldnt load model because it doesn't fit into memory ( -> recommend trying a smaller one)
Etc
This is because in my project users can enter their own URL to a .gguf file (or provide a list of shards), so failure can come in many forms.
I can see in the debug console that the worker has precise information (e.g. GET failed), that that precision isn't passed on yet.
Another exampleI just ran into while trying to load the new version of Phi 3 128K (Q2). I suspect this error is because of a Llama.cpp version mismatch?
Currently a model can fail to load for a number of different reasons. However, the error raised seems to always be a general "failed to load" error. It would be useful if different errors could be raised for:
This is because in my project users can enter their own URL to a .gguf file (or provide a list of shards), so failure can come in many forms.
I can see in the debug console that the worker has precise information (e.g. GET failed), that that precision isn't passed on yet.