Request to Add Option to Disable mmap in transformers | Loading models is taking too much time through due to mmap on storage over network case.

System Info

Ubuntu 22.04
Troch 2.4.0
Cuda 12.4
Transformers 4.44.2
Python 3.11
Diffusers 0.30.2

Who can help?

I will say mainly @ArthurZucker but its more general issue as its involving the base class of transformers pretrained model.

Here's an issue explanation :

I am currently using the transformers library to load CLIPTextModel in a Kubernetes environment where I mount an S3 bucket via the S3 CSI driver as a persistent volume to access models. While accessing large files (around 30 GB), I am experiencing severe performance issues, and after investigating, I believe the root cause is related to the forced usage of mmap when loading model weights.

It seems that the current implementation in this section of the code forces the use of mmap without providing an option to disable it. This behavior is highly problematic in storage-over-network use cases, as each mmap call introduces significant latency and performance bottlenecks due to the overhead of network access.

I think the feature was introduced here => https://github.com/huggingface/transformers/pull/28331

It would be extremely useful if there were a flag or option to disable mmap usage when loading models, allowing users to load the files directly into memory instead. This would enable users like me, to avoid the network-bound performance issues.

I've already tried to find a workaround playing with env variable to disable mmap, but the issue is that i loss so much performance.

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

It's quite hard to reproduce as u need to have AWS Account and CSI Driver. But I belive this issue can be reproduced on any storage over network case.

Anyway here the doc for the driver i used, if needed it can be deployed quite fast on a k8s cluster with the doc https://github.com/awslabs/mountpoint-s3-csi-driver?tab=readme-ov-file

Here u can find a deployement manifest https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml

To reproduce you just have to put the models on the S3 bucket, and try to load them through CLIPTextModel.from_pretrained.

Expected behavior

Loading should be fast.

huggingface / transformers