FileNotFoundError: Can not find model.safetensors or pytorch_model.bin in /tmp/tmpa9lp6hci

System Info

accelerate==0.31.0
aiohttp==3.9.5
aiosignal==1.3.1
anyio==4.3.0
async-timeout==4.0.3
attrs==23.2.0
backcall==0.2.0
bitsandbytes==0.43.1
boto3==1.34.124
botocore==1.34.124
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cmake==3.29.2
dataclasses-json==0.6.5
datasets==2.19.2
decorator==5.1.1
dill==0.3.8
dnspython==2.6.1
docstring_parser==0.16
einops==0.8.0
email_validator==2.1.1
eval_type_backport==0.2.0
exceptiongroup==1.2.1
faiss-cpu==1.8.0
fastapi==0.111.0
fastapi-cli==0.0.2
filelock==3.14.0
frozenlist==1.4.1
fsspec==2024.3.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.23.0
idna==3.7
ipykernel==4.8.2
ipython==7.34.0
ipython-genutils==0.2.0
jedi==0.19.1
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter-client==6.1.12
jupyter_core==4.12.0
langchain==0.2.3
langchain-community==0.2.4
langchain-core==0.2.5
langchain-text-splitters==0.2.1
langsmith==0.1.77
lit==18.1.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
orjson==3.10.3
packaging==23.2
pandas==1.1.5
parso==0.8.4
peft @ git+https://github.com/huggingface/peft.git@ad8f7cb59ee7ca4b9ca1c9048711038ac36b31b8
pexpect==4.8.0
pickleshare==0.7.5
pillow==10.3.0
prompt-toolkit==3.0.43
protobuf==5.26.1
psutil==5.9.8
ptyprocess==0.7.0
pyarrow==16.0.0
pyarrow-hotfix==0.6
pydantic==1.10.16
Pygments==2.18.0
python-dateutil==2.8.2
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2022.7.1
PyYAML==6.0.1
pyzmq==23.2.1
ray==2.20.0
referencing==0.35.1
regex==2024.4.28
requests==2.32.3
rich==13.7.1
rpds-py==0.18.1
s3transfer==0.10.1
safetensors==0.4.3
scikit-learn==1.4.2
scipy==1.13.1
sentence-transformers==2.2.2
sentencepiece==0.2.0
shellingham==1.5.4
shtab==1.7.1
simplegeneric==0.8.1
six==1.16.0
sniffio==1.3.1
SQLAlchemy==2.0.30
starlette==0.37.2
sympy==1.12
tenacity==8.3.0
threadpoolctl==3.5.0
tiktoken==0.7.0
tokenizers==0.19.1
torch==2.0.1
torchvision==0.15.2
tornado==5.1.1
tqdm==4.66.4
traitlets==5.1.1
transformers @ git+https://github.com/huggingface/transformers.git@bd5091df8db7cea1a9f94f797fc11487f840ade1
triton==2.0.0
trl @ git+https://github.com/huggingface/trl.git@f5168fdbaf9cbf6a3f1bdc64dc44b9db3a9ae333
typer==0.12.3
typing-inspect==0.9.0
typing_extensions==4.11.0
tyro==0.8.3
ujson==5.9.0
urllib3==1.26.18
uvicorn==0.29.0
uvloop==0.19.0
vllm==0.2.1.post1
watchfiles==0.21.0
wcwidth==0.2.13
websockets==12.0
xformers==0.0.22
xxhash==3.4.1
yarl==1.9.4

Who can help?

@JingyaHuang

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Hi, I have the following model fine-tuned on Llama3 8b stored in a local directory.

local directory: tmpqnzxt7ni

all_results.json
config.json
events.out.tfevents.1715156608.ip-172-31-35-8.1058706.0
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
README.md
special_tokens_map.json
tokenizer_config.json
tokenizer.json
trainer_state.json
training_args.bin
train_results.json

I'm trying to "export" the model so it can be used with inferentia 2 and I get a weird error that it can't find safetensors in a temp folder that is not the ones i'm adding to the function NeuronModelForCausalLM.

code:

compiler_args = {"num_cores": 2, "auto_cast_type": 'fp16'}
input_shapes = {"batch_size": 1, "sequence_length": 2048}

model = NeuronModelForCausalLM.from_pretrained(
    temp_dir,
    export=True,
    **compiler_args,
    **input_shapes)

error:

Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.51it/s]
Generation config file not found, using a generation config created from the model config.
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-8-6a82917494e1> in <cell line: 2>()
      1 # Load the Model
----> 2 model = NeuronModelForCausalLM.from_pretrained(
      3     temp_dir,
      4     export=True,
      5     **compiler_args,

.../python3.9/site-packages/optimum/modeling_base.py in from_pretrained(cls, model_id, export, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, trust_remote_code, revision, **kwargs)
    400         from_pretrained_method = cls._from_transformers if export else cls._from_pretrained
    401 
--> 402         return from_pretrained_method(
    403             model_id=model_id,
    404             config=config,

.../python3.9/site-packages/optimum/neuron/utils/require_utils.py in wrapper(*args, **kwargs)
     48                     f"install {package_name}"
     49                 )
---> 50             return func(*args, **kwargs)
     51 
     52         return wrapper

.../python3.9/site-packages/optimum/neuron/modeling_decoder.py in _from_transformers(cls, *args, **kwargs)
    322     def _from_transformers(cls, *args, **kwargs):
    323         # Deprecate it when optimum uses `_export` as from_pretrained_method in a stable release.
--> 324         return cls._export(*args, **kwargs)
    325 
    326     @classmethod

.../python3.9/site-packages/optimum/neuron/utils/require_utils.py in wrapper(*args, **kwargs)
     48                     f"install {package_name}"
     49                 )
---> 50             return func(*args, **kwargs)
     51 
     52         return wrapper

.../python3.9/site-packages/optimum/neuron/modeling_decoder.py in _export(cls, model_id, config, use_auth_token, revision, task, batch_size, sequence_length, num_cores, auto_cast_type, **kwargs)
    370             pass
    371 
--> 372         return cls(new_config, checkpoint_dir, generation_config=generation_config)
    373 
    374     @classmethod

.../python3.9/site-packages/optimum/neuron/modeling.py in __init__(self, config, checkpoint_dir, compiled_dir, generation_config)
    669         generation_config: Optional["GenerationConfig"] = None,
    670     ):
--> 671         super().__init__(config, checkpoint_dir, compiled_dir=compiled_dir, generation_config=generation_config)
    672         self.batch_size = self.config.neuron["batch_size"]
    673         self.max_length = self.config.neuron["sequence_length"]

.../python3.9/site-packages/optimum/neuron/utils/require_utils.py in wrapper(*args, **kwargs)
     48                     f"install {package_name}"
     49                 )
---> 50             return func(*args, **kwargs)
     51 
     52         return wrapper

.../python3.9/site-packages/optimum/neuron/modeling_decoder.py in __init__(self, config, checkpoint_dir, compiled_dir, generation_config)
    191         # Instantiate neuronx model
    192         checkpoint_path = checkpoint_dir.name if isinstance(checkpoint_dir, TemporaryDirectory) else checkpoint_dir
--> 193         neuronx_model = exporter.neuronx_class.from_pretrained(checkpoint_path, **tnx_kwargs)
    194 
    195         if compiled_dir is not None:

.../python3.9/site-packages/transformers_neuronx/module.py in from_pretrained(cls, pretrained_model_path, *model_args, **kwargs)
    179             model.load_state_dict_low_memory(state_dict)
    180         else:
--> 181             raise FileNotFoundError(f"Can not find model.safetensors or pytorch_model.bin in {pretrained_model_path}")
    182 
    183         return model

FileNotFoundError: Can not find model.safetensors or pytorch_model.bin in /tmp/tmpa9lp6hci

this is running in a container with an instance type inf2 8x.

Is this something you can help understanding what is happening? where is this /tmp/tmpa9lp6hci coming from?

Thank you for your help, Eliano

Expected behavior

The function should just work as expected.

huggingface / optimum-neuron