Feature: Re-enable symlinking models downloaded from Hub by default

As a user, I want to download a multi-GB model one time, at most. :)

We've entered a new era where many casual software projects weigh in at tens or even hundreds of gigabytes. The Hugging Face Hub is the de facto public store for large models. (Congrats and thank you for that! I remember what it was like hunting for models before.) The cache is a huge value-add for projects using the hub, but disabling symlinks circumvents this value for many existing codebases, and the new default means many new projects will naively miss the cache. I personally have unnecessarily downloaded at least 90GB of duplicate model weights on my laptop since this default was changed, by running third-party software that uses the Hub.

I understand the reasoning behind the changes in #2223, and I also think there is an opportunity now to save a massive amount of bandwidth for the millions of users that use the Hub. There isn't currently an easy way to take advantage of the cache when storing the model in multiple places. This is a very common situation when using third-party software that uses transformers or diffusers, and they are creeping into everything.

Can we re-enable symlinks, and make that the default?

Hi @davidbenton, thanks for the kind words! Regarding https://github.com/huggingface/huggingface_hub/pull/2223, this feature has been heavily discussed before been released and I don't think we'll ever change that back. There are mostly 2 types of tools that are using huggingface_hub:

tools based on transformers, diffusers, datasets but also keras, sentence-transformers, etc. that are using huggingface_hub to download weights from the Hub, without the user caring much about the files location. The cache system is designed for all those tools and is the default (and the most used by far!) when using our library.
tools that can be based on the above libraries but that are more "manual". In some groups, users are more used to handle their files by themselves (typically the text-to-image or the "local LLMs" communities for instance).

When we implemented the feature to download to a local directory, we didn't know its usage without symlinks (i.e. explicitly asking not to use symlinks) would be that high. In the end, downloading to a local dir with symlinks was causing all sorts of problems (copies between hard-drives, incompatibilities, etc.) and without symlinks was not using a cache at all. The solution implemented in https://github.com/huggingface/huggingface_hub/pull/2223 tries to circumvent these issues and acknowledge the fact when users want to download weights to a local folder -and explicitly not used the shared cached-, it is their responsibility to optimize the downloads.

Thanks for the quick response @Wauplin. Yes, you've identified exactly how I ended up here! I have been using both local LLM and diffusion image generation tools this year. I'm an engineer and those communities aren't following best practices (to put it mildly), but this is what happens when new people get excited to use a technology. That's a good thing, and it's going to continue.

If HF wants the Hub to be the internet repository of models and datasets, there's a significant internet-scale cost to having a no-cache default when storing models within their project file structure, as most will do naively. In the discussion at #1738, @julien-c seemed to raise this concern. This cost will only increase as these non-HF (i.e., not based on transformers, diffusers, etc.) projects become more common and more popular.

when users want to download weights to a local folder -and explicitly not used the shared cached-, it is their responsibility to optimize the downloads.

Users don't want to do this, they're using tools that do it. This week, I'm here because I'm trying to tame the ComfyUI ecosystem, but hundred of other projects will become popular and find their way onto users' computers in the coming years. Third-party software developers aren't aware of the cache (or incentivized to save the bandwidth for HF or end users by using it). This is a systems problem that the Hub is uniquely positioned to solved. That's why the cache was created in the first place, right?

I hope you'll give it some more thought. Otherwise, in a few years we'll all have 8 copies of Llama-5-300B on our computers, lol.

huggingface / huggingface_hub

Feature: Re-enable symlinking models downloaded from Hub by default #2548