huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 223 forks source link

No timeout downloading model card data from hub api when loading pretrained model in disconnected environment #495

Open chrisaballard opened 9 months ago

chrisaballard commented 9 months ago

When calling SetFitModel.from_pretrained to load a pretrained model from a local directory, setfit attempts to load the model card using the huggingface hub api in the method model_card.SetFitModelCardData.infer_st_id. In a disconnected environment with no internet access, this causes it to hang for a long time as it tries to call the hub REST API, usually up to around 5 minutes. This causes problems in environments where no internet access is available, perhaps for security reasons.

A workaround to this problem, is to specify a timeout by passing the timeout argument to model_info in the is_on_huggingface() function.

It would be helpful to be able to set a global timeout for all operations involving a call to the hub api, in a similar way to how datasets allows the HF_DATASETS_OFFLINE to be specified.

vlmazlov-plutoflume commented 5 months ago

It would be helpful to be able to set a global timeout for all operations involving a call to the hub api, in a similar way to how datasets allows the HF_DATASETS_OFFLINE to be specified.

While I'm not sure exactly this is possible, I believe we've found an environment variable that should help in your case (or, at least, certainly the use case you had at the time :wink:). Specifically, either of these should make it so no requests are sent to the Hub in this scenario:

HF_HUB_OFFLINE=1

or (at least in newer versions of huggingface_hub):

TRANSFORMERS_OFFLINE=1

Documentation Code