huggingface / exporters

Export Hugging Face models to Core ML and TensorFlow Lite
Apache License 2.0
611 stars 44 forks source link

Export of Llama2 fails #76

Open rradjabi opened 4 months ago

rradjabi commented 4 months ago

I'm unable to use exporters for meta-llama/Llama-2-7b-chat-hf model.

Here is my command

python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage

And here is the output

 % python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage
Torch version 2.3.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.44s/it]
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
    - use_cache -> False
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/modeling_utils.py:4371: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:1094: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                             | 0/3690 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!

ERROR - converting 'full' op (located at: 'model'):

Converting PyTorch Frontend ==> MIL Ops:   1%|▉                                                                                                                                 | 28/3690 [00:00<00:00, 5249.21 ops/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
              ^^^^^^^
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 660, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 553, in export_pytorch
    mlmodel = ct.convert(
              ^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 581, in convert
    mlmodel = mil_convert(
              ^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 288, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 82, in load
    return _perform_torch_convert(converter, debug)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 116, in _perform_torch_convert
    prog = converter.convert()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 581, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 86, in convert_nodes
    raise e     # re-raise exception
    ^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 81, in convert_nodes
    convert_single_node(context, node)
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 134, in convert_single_node
    add_op(context, node)
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4211, in full
    else NUM_TO_NUMPY_DTYPE[TORCH_DTYPE_TO_NUM[inputs[2].val]]
                            ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 6

I was able to generate a mlpackage for distilbert-base-uncased-finetuned-sst-2-english, with this command: python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification models/defaults.mlpackage, so I have some confidence that the environment is correct and working.

kinghchan commented 4 months ago

I have the exact same issue

Proryanator commented 2 months ago

I get the same error when trying to convert: https://huggingface.co/HuggingFaceTB/SmolLM-1.7B

I'm wondering if this is one of the situations where the torch operation -> CoreML Operation mapping does not automatically work (i.e. requires us to write our own operator: https://apple.github.io/coremltools/docs-guides/source/custom-operators.html

Proryanator commented 2 months ago

@rradjabi try installing coremltools 8 and a newer version of transformers! I was able to run this conversion just fine 👏 (with my own memory fixing patch ofcourse).

mattusi commented 2 months ago

@Proryanator Could you please provide more details about your fix? Thanks

Proryanator commented 2 months ago

@Proryanator Could you please provide more details about your fix? Thanks

Yeah sure! Let me collect the specific details (it was a bit complicated in the end).

In a nutshell though:

Out of Memory Issue For some models (not even the ones that were that large) including llama2, I would get an out of memory error on my M3 Max w/ 36GB of RAM. This happened when coremltools tried to load the converted model toward the end. Figured out that making a 1 line change to exporters would fix this issue for me, here is that change: https://github.com/huggingface/exporters/pull/83

Unsupported 'full' op It was either from upgrading to coremltools 8.0b1 where this op issue went away, or using an older version of transformers that fixed the issue for me (I did both so I can't say which one at the moment). Lemme double check and I can get back to you though with specifics (pretty sure it was the transformers version though).