Closed mlinke-ai closed 1 day ago
Since you already downloaded the model onto disk and you are providing an input path to a directory, you don't need the -m phi-3-mini-128k-instruct-onnx-cpu
part in your command. Can you omit that part and try again?
Omitting the -m phi-3-mini-128k-instruct-onnx-cpu
did not work. The directory is still empty.
I have some warnings about missing flash_attn
. But I don't think this could be the problem.
Hi @mlinke-ai, can you share the full output of the command here. And what are the specs of the machine you are running on?
The complete output of the command is as following (slightly shortened to remove some clutter):
Valid precision + execution provider combinations are: FP32 CPU, FP32 CUDA, FP16 CUDA, FP16 DML, INT4 CPU, INT4 CUDA, INT4 DML
Extra options: {'int4_accuracy_level': '1', 'filename': 'phi3-mini-128k-instruct-onnx-cpu.onnx'}
C:\Users\mlinke\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\configuration_auto.py:950: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
GroupQueryAttention (GQA) is used in this model.
C:\Users\mlinke\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\auto_factory.py:469: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
2024-07-08 13:36:31,595 transformers_modules.phi-3-mini-128k-instruct.modeling_phi3 [WARNING] - `flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
2024-07-08 13:36:31,595 transformers_modules.phi-3-mini-128k-instruct.modeling_phi3 [WARNING] - Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Loading checkpoint shards: 100%|##########| 2/2 [00:22<00:00, 11.11s/it]
Reading embedding layer
Reading decoder layer 0
Reading decoder layer 1
Reading decoder layer 2
...
Reading decoder layer 29
Reading decoder layer 30
Reading decoder layer 31
Reading final norm
Reading LM head
Saving ONNX model in \\?\C:\Users\mlinke\Documents\ML\repos\phi-3-mini-128k-instruct-onnx-cpu
My machine is a Lenovo ThinkBook 15 G3 ACL with the following specs:
It looks like you are running out of memory. Do you have a larger machine you can use?
I am working on an improved method to load these large models using mmap instead to avoid out-of-memory errors such as this one and the above ones you've faced. With mmap, the model builder can then adapt to the machine's memory constraints.
Sorry for the long delay, company processes are slow sometimes.
Sadly, I don't have access to a machine with more RAM. Looking forward for your implementation using mmap.
I'll close this issue for now. We will announce when we have the memory improvements.
I have downloaded the
microsoft/phi-3-mini-128k-instruct
model from huggingface using thehuggingface-cli
script.When I try to convert the model to ONNX format, the directory specified with the
-o
flag stays empty.I use the following command: