Closed AlpinDale closed 1 month ago
Thanks @AlpinDale for raising the question. hf_transfer
is indeed quite stable (at least we don't make changes to it very often). For the record, it is enabled by default on all Spaces for example. However, it is not the best solution for everyone for several reasons:
hf_transfer
is faster only when the bandwidth allows it. This is the case on clusters / machines with good connections but on normal or slow connections, it does not bring any benefit. In some cases it even deteriorates speed because of the multi-process overload.hf_transfer
maxes-out the CPU cores. This makes it very unsuitable for parallelism. In huggingface_hub
we make sure to not parallelize download but users could launch several processes in parallel and in such a case, it would spawn "N_user_processes * N_cores" processes which would completely bloat the CPU. Maxing out the CPU can also lead to a very deteriorated UX on the user machine (think "everything is frozen").hf_transfer
does not handle proxies and don't have a retry mechanism. It is also not possible to resume a stopped download. All of this is doable with the normal implementation based on requests
.requests
implementation (updates only every 50MB). For slow connections that means a poor user experience.So all things considered, hf_transfer
is stable enough for a lot of use cases but we are not aiming at making it the default. The best way to enable it is to set
HF_HUB_ENABLE_HF_TRANSFER=1
in your .bashrc
-like file on machines you managehuggingface_hub[hf_transfer]
to your requirements.txt
-like file @AlpinDale out of curiosity, is your use case in the context of a CLI command, or from Python code?
Sorry, I was away a for bit. Thanks for answering, @julien-c
I use both the CLI and the python API. For now, I can manage by exporting the hf_transfer env variable in my bashrc.
hf_transfer
, to my knowledge, has become very stable recently. I use it daily, and I find it a bit cumbersome that we have to manually install the package, then export a very long env variable to finally have access to faster downloads. I believe it's about time it was made the default behaviour for huggingface_hub. Thoughts? I'm sure many others in the community same the share belief as me.