neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.03k stars 140 forks source link

RecursionError when converting LlaMa model to ONNX #2016

Closed luuyin closed 4 months ago

luuyin commented 7 months ago

Describe the bug RecursionError: maximum recursion depth exceeded while getting the str of an object

Expected behavior I want to the convert a LlaMa model into ONNX and then benchmark it in deepsparse

Environment

conda create -n sparseml_main python=3.9
conda activate sparseml_main
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
pip install deepsparse
pip install sentencepiece

To Reproduce

First download the model using huggingface-cli download baffo32/decapoda-research-llama-7B-hf --local-dir llama-7B Then convert using sparseml.export --task text-generation llama-7B

Errors

2024-01-26 16:49:06 sparseml.export.export INFO     Starting export for transformers model...
2024-01-26 16:49:06 sparseml.export.export INFO     Creating model for the export...
2024-01-26 16:49:06 sparseml.transformers.integration_helper_functions WARNING  trust_remote_code is set to False. It is possible, that the model will not be loaded correctly.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Traceback (most recent call last):
  File "/home/sliu01/miniconda3/envs/sparseml_main/bin/sparseml.export", line 8, in <module>
    sys.exit(main())
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/export/export.py", line 432, in main
    export(
  File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/export/export.py", line 212, in export
    model, loaded_model_kwargs = helper_functions.create_model(
  File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/transformers/integration_helper_functions.py", line 101, in create_model
    tokenizer = initialize_tokenizer(source_path, sequence_length, task)
  File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/transformers/utils/initializers.py", line 76, in initialize_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 751, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
    return cls._from_pretrained(
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 134, in __init__
    self.update_post_processor()
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 147, in update_post_processor
    bos_token_id = self.bos_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1172, in bos_token_id
    return self.convert_tokens_to_ids(self.bos_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id

  ''''' repeat '''''

  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1057, in unk_token
    return str(self._unk_token)
RecursionError: maximum recursion depth exceeded while getting the str of an object

Additional context Also tried to install sparseml usingpip install sparseml, and tried to convert using sparseml.transformers.export_onnx But the same error happens.

.

mgoin commented 7 months ago

Hey @luuyin it seems this model has issues with it's tokenizer setup, which might have been covered up by installing sentencepiece. I would recommend trying a llama model that is configured to work well with native transformers

If I simply try to use the model as-is in native transformers, it fails in the same way so this isn't a sparseml specific issue:

from transformers import pipeline
pipe = pipeline("text-generation", model="llama-7B")
prompt = "How many helicopters can a human eat in one sitting?"
outputs = pipe(prompt)
print(outputs[0]["generated_text"])

output:

  File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1057, in unk_token
    return str(self._unk_token)
long repeating

If that sentencepiece lib is not installed, I get this error:

ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
mgoin commented 7 months ago

@luuyin I have validated that the flow works as expected with a properly-formed llama 2 model - i used NousResearch/Llama-2-7b-hf for this

initial setup

python3 -m venv ~/venvs/test-fail
source ~/venvs/test-fail/bin/activate
pip install sentencepiece -e "sparseml[transformers]"
huggingface-cli download baffo32/decapoda-research-llama-7B-hf --local-dir llama-7B
sparseml.export --task text-generation llama-7B

this fails because of the decapoda research model

If I replace that model with this version https://huggingface.co/NousResearch/Llama-2-7b-hf, it works as expected

huggingface-cli download NousResearch/Llama-2-7b-hf --local-dir llama-7B
sparseml.export --task text-generation llama-7B
luuyin commented 7 months ago

initial

@luuyin I have validated that the flow works as expected with a properly-formed llama 2 model - i used NousResearch/Llama-2-7b-hf for this

initial setup

python3 -m venv ~/venvs/test-fail
source ~/venvs/test-fail/bin/activate
pip install sentencepiece -e "sparseml[transformers]"
huggingface-cli download baffo32/decapoda-research-llama-7B-hf --local-dir llama-7B
sparseml.export --task text-generation llama-7B

this fails because of the decapoda research model

If I replace that model with this version https://huggingface.co/NousResearch/Llama-2-7b-hf, it works as expected

huggingface-cli download NousResearch/Llama-2-7b-hf --local-dir llama-7B
sparseml.export --task text-generation llama-7B

Thank you for your prompt response, Michael Goinm. Indeed, implementing the model you suggested seems promising for a successful conversion to an ONNX format, with a minor warning.

Attempting to validate an in-memory ONNX model with size > 2000000000 bytes.validate_onnxskipped, as large ONNX models cannot be validated in-memory. To validate this model, save it to disk and callvalidate_onnxon the file path.

However, in the following step, when I tried to benchmark the converted model using deepsparse.benchmark llama-7B/deployment/model.onnx --sequence_length 2048

I got the following error:


2024-01-26 19:20:34 deepsparse.benchmark.helpers INFO     Thread pinning to cores enabled
2024-01-26 19:20:34 deepsparse.benchmark.benchmark_model INFO     Found model with KV cache support. Benchmarking the autoregressive model with input_ids_length: 1 and sequence length: 2048.
2024-01-26 19:20:34 deepsparse.benchmark.benchmark_model INFO     Benchmarking Engine: deepsparse with internal KV cache management
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.1 COMMUNITY | (eff4f95d) (release) (optimized) (system=avx512_vnni, binary=avx512)
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.1 (eff4f95d) (release) (optimized) (system=avx512_vnni, binary=avx512)
Date: 01-26-2024 @ 19:20:35 CET
OS: Linux gcn17.local.snellius.surf.nl 4.18.0-372.80.1.el8_6.x86_64 #1 SMP Fri Nov 3 14:30:16 EDT 2023
Arch: x86_64
CPU: GenuineIntel
Vendor: Intel
Cores/sockets/threads: [72, 2, 72]
Available cores/sockets/threads: [36, 1, 36]
L1 cache size data/instruction: 48k/32k
L2 cache size: 1.25Mb
L3 cache size: 54Mb
Total memory: 503.518G
Free memory: 387.075G
Thread: 0x14734509b440

Assertion at ./src/include/wand/engine/execution/planner.hpp:121

Backtrace:
 0# 0x000014723b0313c3: 
    [440fb6c34c8b25925a2c026a004489e94c89f641b9010000004c89e7e87cb224]
    [02585a84db75084c89e7e8aecbffff4c89e7e876b2240248833d8e552c020074]
 1# 0x000014723b033c18: 
    [e4892402b901000000ba79000000488d352b8895fe488d3d19e58cfee857d7ff]
    [ff4889c3c5f877e9edfaffff4889c3e912fbffff4889c3c5f877e9f7faffff48]
 2# wand::engine::compiler::compiler::execution_graph_to_linear_order(wand::engine::execution::graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
 3# wand::engine::compiler::compiler::compile(wand::engine::execution::graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
 4# wand::engine::compiler::compiler::compile(wand::engine::compute::compute_graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
 5# wand::engine::compiler::compiler::compile(wand::engine::intake::graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
 6# 0x000014723a1f3c97: 
    [ff488bbc24c80000004885ff7405e88673ebff4c89ea4889de4c89f7e8e88a08]
    [0349c7042400000000bff0000000e8a6390003488b1567a581ff488d4810c5f1]
 7# 0x000014723a1fa2e3: 
    [850801000083400801488dbbb00000004d89e04c89e94c89f24889dee85c97ff]
    [ff488b45b84989c64885c0741f488b1d79c60f034885db0f85c00000008b4008]
 8# 0x000014723a23bc10: 
    [5648488b464c53415450524889fa488bbde0fcffff488985f8fcffffe8efe4fb]
    [ff488bbd08feffff4883c4204885ff7405e8eaf3e6ff488bbd18fdffff4885ff]
 9# 0x000014723a204ee5: 
    [0000488dbc2490000000488d4c24404c8b44241848897c2420488b33e8ba5a03]
    [00488b4308c5f9efc0c5f96f942490000000c5f97f8424900000004889442430]
10# 0x000014723a2062c0: 
    [4154c4e1f96eca4989fcc4e3f122c6014883ec104889e6c5f97f0424e81feaff]
    [ff4883c4104c89e0415cc3cccccccccc488b07c5fa6f06c5fa7f00c5fa6f4e10]
11# 0x000014723aa90a48: 
    [02e982ba7e02cccc415741564155415455534883ec38488b0648897c2410ff50]
    [28488b4424104c8b7008488b184c89f04829d84889c248c1f80548c1fa034885]
12# 0x000014723aa921e2: 
    [488b742470488dbc24700100004c89f24889c148897c241848890424e84de8ff]
    [ff488b4310488b0bc5f9efc0c5f96fb42470010000488b530848894424104989]
13# 0x000014723aa98293: 
    [488d8550feffff4889c7488985f8fcffff4889c3c5fa7f8538feffffe8dc9cff]
    [ff4883bd50feffff000f847803000041b879010000488d0da99ae5fe4889de31]
14# 0x000014723aa9ac0e: 
    [89f94c8b0bff75c04c89f24c89ee488bbd58ffffffffb548ffffff50e8b1d2ff]
    [ff4883c42048837d800074b6488bb558ffffff41b825020000488d0df47fe6fe]
15# 0x000014723a0e0ff4: 
    [ffc5f97f8590feffff488d4838488d85c0feffff5048898550feffffe89b999b]
    [004883bd60ffffff00585a0f849b000000418bbf74090000488bb578feffff41]
16# 0x000014723a0ed768: 
    [0fb68540fcffff488b9578fcffff4889de4c89ff89c1898560fcffffe89734ff]
    [ff4883bdd0fdffff000f84e10000008bbb7409000041b81a0600004c89fe488d]
17# 0x000014723a0abf79: 
    [836f080175e1ebbf0f1f8000000000488b442430488b7c2438488b30e8c60304]
    [00488b44245048894424304885c00f84eafcffff488b7c2438e8d97b9c00e9c7]
18# 0x000014723a0c0284: 
    [83c4184c89e05b5d415c415dc30f1f800000000031d24c89ee4889efe81bb2fe]
    [ff488b7c24084989c44885c075c648893b4883c4184c89e05b5d415c415dc348]
19# deepsparse::ort_engine::init(wand::arch_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::shared_ptr<wand::parallel::scheduler_factory_t>, std::optional<NMExecutionProviderEngineParams> const&, std::optional<NMExecutionProviderBenchmarkParams> const&) in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so
20# 0x000014723d3a9932: 
    [89e94d89f88b8d2cffffff4c89f7488b9530ffffff488b352a050d00e8dd690c]
    [00488b7da8595e4885ff7405e8fd640c00488b7d804885ff0f8494000000807d]
21# 0x000014723d3aab3e: 
    [8b4d104c8d4d804d89f06a00488bb5f8feffff4489e24889c74989c5e871ebff]
    [ff488b7d88585a4885ff7405e8f1520c004c89ffe8e9570c00e9b5fcffff0f1f]
22# 0x000014723d3ea8fa: 
    [feffff41574c8b8538feffff4d89f1488b8d58feffff8bb550feffffe825fefb]
    [ff488bbd78feffff4989c4585a4885ff7405e82f55080080bd18ffffff007432]
23# 0x000014723d3b5aa7: 
    [30ffff4c8dac24c0010000488b78404c89eee8e2a50b00498b04244c89e7ff50]
    [304989c74c89ef4889442478e8b832ffff4983ff010f85760200004983c4684c]
24# cfunction_call at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543
25# _PyObject_MakeTpCall at /usr/local/src/conda/python-3.9.18/Objects/call.c:191
26# method_vectorcall at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:116
27# method_vectorcall at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:103
28# method_vectorcall at /usr/local/src/conda/python-3.9.18/Objects/classobject.c:83
29# slot_tp_init at /usr/local/src/conda/python-3.9.18/Objects/typeobject.c:6974
30# type_call at /usr/local/src/conda/python-3.9.18/Objects/typeobject.c:1028
31# pybind11_meta_call in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/onnx/onnx_cpp2py_export.cpython-39-x86_64-linux-gnu.so
32# _PyObject_MakeTpCall at /usr/local/src/conda/python-3.9.18/Objects/call.c:191

Please email a copy of this stack trace and any additional information to: support@neuralmagic.com

Could you please help me look at this ? Thanks!!!

mgoin commented 7 months ago

This is definitely an unexpected error, looking into this now

mgoin commented 7 months ago

I was able to recreate this using your exact setup, particularly using deepsparse==1.6.1. We've had a lot of changes for 1.7 which is releasing soon, so I was able to actually run fine on our nightly build (or from source).

To set that up, make sure to uninstall sparseml, sparsezoo, and deepsparse, then do pip install sparseml-nightly[transformers] deepsparse-nightly or install both from source, not just sparseml. With that environment, I was able to export and run fine

jeanniefinks commented 5 months ago

Hi @luuyin A heads up that 1.7 recently went out. We hope this can address the issue you faced. Thank you! Jeannie / Neural Magic