huggingface / safetensors

Simple, safe way to store and distribute tensors
https://huggingface.co/docs/safetensors
Apache License 2.0
2.87k stars 196 forks source link

[Question] Comparison with the zarr format? #527

Open julioasotodv opened 1 month ago

julioasotodv commented 1 month ago

Hi,

I know that safetensors are widely used nowadays in HF, and the comparisons made in this repo's README file make a lot of sense.

However, I am now surprised to see that there is no comparison with zarr, which is probably the most widely used format to store tensors in an universal, compressed and scalable way.

Is there any particular reason why safetensors was created instead of just using zarr, which has been around for longer (and has nice benefits such as good performance in object storage reads and writes)?

Thank you!

User21T commented 7 hours ago

Hello.

I don't represent Hugging Face or its position on the issue. However, I think the main reason why creating safetensors was better than using zarr is that the latter is just an universal format to store any kind of tensor. Meanwhile, safetensors was specifically designed to store Machine Learning models and work within HF ecosystem. It guarantees better performance, security and ML-specific types integration (Bfloat16, Fp8).

If I'm wrong, please correct me.