huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
2.12k stars 556 forks source link

Model Compression #1446

Open jozefchutka opened 1 year ago

jozefchutka commented 1 year ago

I was thinking, the hosted files (i.e. models) could use compression like brotli. Considering its all static files this could be done once instead on per request.

For example, decoder_model_merged.onnx has ~50MB but can be compressed ~30MB using brotli:

brotli decoder_model_merged.onnx -o decoder_model_merged.onnx.br -Z -f

There are many sites and online demos using huggingface cdn, fetching large models. There might be substantial reduction in traffic and waiting times if these files are compressed. Considering it follows the request headers and serves with proper response headers this will be very transparent (no code change needed) for end devs / users.

I have originally posted the feature request to transformers repo ( https://github.com/huggingface/transformers/issues/22579 ) but now I think, this is a better place to report. Please let me know

gaby commented 1 year ago

@jozefchutka May be even better to use zstd or lz4

jozefchutka commented 1 year ago

I proposed brotli esp. considering its availability on web:

Other proposed compressions surely might be suitable / helpful for other platforms, and its worth to evaluate

I would like to extend my feature request by having an additional response header x-content-length, which would provide size of the original (uncompressed file). Unfortunately, browser fetch APIs are designed the way that without this information, one is not able to read accurate download progress information.

Wauplin commented 1 year ago

Hi @jozefchutka @gaby thanks for your interest on this matter and for opening an issue. We are aligned with you that compression would be great. Unfortunately it would have a side effects on some tools that we should not ignore.

For example, the huggingface_hub cache system is based on the returned "Etag" header set by the server. In case of compressed content, this Etag changes slightly from a strong ref '"abcdef..."' to a weak ref 'W/"abcdef..."'. The code to parse the etag has been fixed in https://github.com/huggingface/huggingface_hub/issues/1428 to handle both strong and weak refs seamlessly. But this means if we enable compression right now, anyone using an old version of huggingface_hub will not be able to use the Hub anymore, which we want to avoid.

TL;DR: The mid-term goal is indeed to add compression but it will not happen anytime soon due to backward compatibility matters (cc @XciD @julien-c)

EDIT: don't know if it's possible on our side to enable compression by default and disable it for certain user agents.