huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.32k stars 26.35k forks source link

Request to Add Option to Disable mmap in transformers | Loading models is taking too much time through due to mmap on storage over network case. #33366

Open mrrfr opened 2 weeks ago

mrrfr commented 2 weeks ago

System Info

Who can help?

I will say mainly @ArthurZucker but its more general issue as its involving the base class of transformers pretrained model.

Here's an issue explanation :

I am currently using the transformers library to load CLIPTextModel in a Kubernetes environment where I mount an S3 bucket via the S3 CSI driver as a persistent volume to access models. While accessing large files (around 30 GB), I am experiencing severe performance issues, and after investigating, I believe the root cause is related to the forced usage of mmap when loading model weights.

It seems that the current implementation in this section of the code forces the use of mmap without providing an option to disable it. This behavior is highly problematic in storage-over-network use cases, as each mmap call introduces significant latency and performance bottlenecks due to the overhead of network access.

I think the feature was introduced here => https://github.com/huggingface/transformers/pull/28331

It would be extremely useful if there were a flag or option to disable mmap usage when loading models, allowing users to load the files directly into memory instead. This would enable users like me, to avoid the network-bound performance issues.

I've already tried to find a workaround playing with env variable to disable mmap, but the issue is that i loss so much performance.

Information

Tasks

Reproduction

It's quite hard to reproduce as u need to have AWS Account and CSI Driver. But I belive this issue can be reproduced on any storage over network case.

Anyway here the doc for the driver i used, if needed it can be deployed quite fast on a k8s cluster with the doc https://github.com/awslabs/mountpoint-s3-csi-driver?tab=readme-ov-file

Here u can find a deployement manifest https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml

To reproduce you just have to put the models on the S3 bucket, and try to load them through CLIPTextModel.from_pretrained.

Expected behavior

Loading should be fast.

LysandreJik commented 1 week ago

Hey @mrrfr, this is the case only for files that are saved in the .bin format, which are unsafe. Would it be possible for you to use .safetensors files, which are safer and don't use mmap to load?