Translation model M2M100 uses 2 models in cache (from version 4.46.0)

System Info

I'm using facebook/m2m100_418M translation model. From version 4.46.0 it downloads another model which wieghts ~2 GB. I'm using python 3.11, in ubuntu

Who can help?

@ArthurZucker

Information

[X] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

import torch
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

model = M2M100ForConditionalGeneration.from_pretrained(
    "facebook/m2m100_418M",
    torch_dtype=torch.float16,
).to("cpu").eval()

token = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")

encoded_text = token("my name is earl", return_tensors="pt")
encoded_text = encoded_text.to("cpu")

target_lang_id = token.get_lang_id("he")
generated_tokens = model.generate(**encoded_text, forced_bos_token_id=target_lang_id)
print(generated_tokens)

Expected behavior

The models are being put in /home/ubuntu/.cache/huggingface/hub/models--facebook--m2m100_418M/ until version 4.46.0 there was this hierarchy: snapshots/55c2e61bbf05dfb8d7abccdc3fae6fc8512fd636 which contained 7 files (one of them is the model itself pytorch_model.bin - ~2 GB). From version 4.46.0, there is a new dir: snapshots/791dc1c6d300846c9a747d4bd11fcc7f369b750e, there is one file in there: model.safetensors, which is a soft link to another heavy ~2GB file in blobs dir.

Can you please resolve it and make it download and use only one model file? this usage is very wasteful.

Thanks!

huggingface / transformers