neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

Segmentation fault (core dumped) #1341

Closed Mahran-xo closed 8 months ago

Mahran-xo commented 10 months ago

I am trying to run

from deepsparse import TextGeneration
pipeline = TextGeneration(model="/mnt/d/mpt-7b-dolly_mpt_pretrain-pruned50_quantized/deployment")

prompt="""
Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: what is sparsity? ### Response:
"""
print(pipeline(prompt, max_new_tokens=75).generations[0].text)

# Sparsity is the property of a matrix or other data structure in which a large number of elements are zero and a smaller number of elements are non-zero. In the context of machine learning, sparsity can be used to improve the efficiency of training and prediction.

but it gives an error saying:

2023-10-23 01:51:05 deepsparse.transformers.pipelines.text_generation INFO Compiling an auxiliary engine to process a prompt with a larger processing length. This improves performance, but may result in additional memory consumption. 2023-10-23 01:51:05 deepsparse.utils.onnx INFO Overwriting in-place the input shapes of the transformer model at /mnt/d/DMS_NLP/LangChain/LLAMA/mpt-7b-dolly_mpt_pretrain-pruned50_quantized/deployment/model.onnx DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20231020 COMMUNITY | (9eb1e5d9) (release) (optimized) (system=avx2, binary=avx2) Segmentation fault (core dumped)

Environment

  1. OS Ubuntu 22.04.2 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64):
  2. Python version [e.g. 3.9]:
  3. DeepSparse versioninstalled through pip install -U deepsparse-nightly[llm]
  4. CPU info - 'L1_data_cache_size': 16384, 'L1_instruction_cache_size': 16384, 'L2_cache_size': 262144, 'L3_cache_size': 0, 'architecture': 'x86_64', 'available_cores_per_socket': 4, 'available_num_cores': 4, 'available_num_hw_threads': 8, 'available_num_numa': 1, 'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 2, 'bf16': False, 'cores_per_socket': 4, 'dotprod': False, 'i8mm': False, 'isa': 'avx2', 'num_cores': 4, 'num_hw_threads': 8, 'num_numa': 1, 'num_sockets': 1, 'threads_per_core': 2, 'vbmi': False, 'vbmi2': False, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz', 'vnni': False, 'zen1': False
tlrmchlsmth commented 10 months ago

Hi @Mahran-xo, thanks for the bug report. I've been trying to reproduce this on a similar machine, but no luck so far. Is there any more output that gets printed after the segfault? I'm looking for the hex values in registers and some backtrace information that we print out in the case of a segmentation fault. How much RAM do you have available on this machine?

Mahran-xo commented 10 months ago

hello sorry for the late reply . i tried another model ( zoo:mpt-7b-mpt_chat_mpt_pretrain-base_quantized ) and it downloaded . but this time there's a different error. it says the following

2023-10-29 13:22:42 deepsparse.utils.onnx INFO     Overwriting in-place the input shapes of the transformer model at /mnt/d/DMS_NLP/LangChain/LLAMA/local-model/deployment/model.onnx
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20231020 COMMUNITY | (9eb1e5d9) (release) (optimized) (system=avx2, binary=avx2)
2023-10-29 13:22:42.443931000 [E:onnxruntime:, inference_session.cc:1693 operator()] Exception during initialization: /home/centos/build/nyann/external/onnx-runtime/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&) [ONNXRuntimeError] : 1 : FAIL : GetFileLength for /mnt/d/DMS_NLP/LangChain/LLAMA/local-model/deployment/model.data failed:Invalid fd was supplied: -1

[nm_ort 7f90fb961440 >ERROR< init src/libdeepsparse/ort_engine/ort_engine.cpp:538] std exception  Exception during initialization: /home/centos/build/nyann/external/onnx-runtime/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&) [ONNXRuntimeError] : 1 : FAIL : GetFileLength for /mnt/d/DMS_NLP/LangChain/LLAMA/local-model/deployment/model.data failed:Invalid fd was supplied: -1

Traceback (most recent call last):
  File "/mnt/d/DMS_NLP/LangChain/LLAMA/sparse.py", line 5, in <module>
    pipeline = TextGeneration(model=model_path)
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/pipeline.py", line 814, in text_generation_pipeline
    return Pipeline.create("text_generation", *args, **kwargs)
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/base_pipeline.py", line 210, in create
    return pipeline_constructor(**kwargs)
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/transformers/pipelines/text_generation.py", line 273, in __init__
    self.engine, self.multitoken_engine = self.initialize_engines()
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/transformers/pipelines/text_generation.py", line 353, in initialize_engines
    multitoken_engine = NLDecoderEngine(
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/transformers/engines/nl_decoder_engine.py", line 82, in __init__
    self.engine = create_engine(
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/pipeline.py", line 759, in create_engine
    return Engine(onnx_file_path, **engine_args)
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/engine.py", line 327, in __init__
    self._eng_net = LIB.deepsparse_engine(
RuntimeError: NM: error: Exception during initialization: /home/centos/build/nyann/external/onnx-runtime/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&) [ONNXRuntimeError] : 1 : FAIL : GetFileLength for /mnt/d/DMS_NLP/LangChain/LLAMA/local-model/deployment/model.data failed:Invalid fd was supplied: -1 

the code i used to load this model :

from deepsparse import TextGeneration
# construct a pipeline
model_path = "./local-model/deployment"
pipeline = TextGeneration(model=model_path)

# generate text
prompt = "Below is an instruction that describes a task? ### Response:"
output = pipeline(prompt=prompt)
print(output.generations[0].text)
tlrmchlsmth commented 10 months ago

@Mahran-xo, regarding the segfault you ran into, are you on WSL1? If so I think that should be resolved in the latest nightly, 1.6.0.20231031

The second potentially looks like a missing model.data -- that needs to be in the deployment directory as well.

Mahran-xo commented 10 months ago

thanks for the reply! , i followed your instructions and the error disappeared but this time i am getting this error

(linx) mahran@ali-tar:/mnt/d/DMS_NLP/LangChain/LLAMA$ /home/mahran/anaconda3/envs/linx/bin/python /mnt/d/DMS_NLP/LangChain/LLAMA/sparse.py
2023-10-31 23:59:23 deepsparse.transformers.pipelines.text_generation WARNING  This ONNX graph does not support processing the promptwith processing length > 1
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20231031 COMMUNITY | (74098695) (release) (optimized) (system=avx2, binary=avx2)
[7f16d8570640 >WARN<  operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy.
Traceback (most recent call last):
  File "/mnt/d/DMS_NLP/LangChain/LLAMA/sparse.py", line 9, in <module>
    output = pipeline(prompt=prompt)
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/pipeline.py", line 238, in __call__
    engine_inputs = self.process_inputs(pipeline_inputs)
  File "/home/mahran/anaconda3/envs/linx/lib/python3.9/site-packages/deepsparse/transformers/pipelines/text_generation.py", line 472, in process_inputs
    if not self.cache_support_enabled and generation_config.max_length > 1:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
jeanniefinks commented 9 months ago

Hello @Mahran-xo May you try with our latest nightly to see if you can reproduce the new error you are having? Thank you for sharing! Jeannie / Neural Magic

jeanniefinks commented 8 months ago

Hello @Mahran-xo Happy New Year! As it's been some time without a response, we are going to go ahead and close out this issue. Please let us know if you have further details on this specific topic and re-open the thread; we're happy to help! Thank you!

Jeannie / Neural Magic