neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.94k stars 169 forks source link

Assertion at src/lib/core/topology.cpp:627 #1527

Closed Zorgosto closed 2 months ago

Zorgosto commented 6 months ago

Describe the bug When I try to run the example LLM TextGeneration code I get an assertion error. (Sorry for any formatting errors, if you have tips to make it more readable please tell me).

Expected behavior Run the LLM and get the output.

Environment Include all relevant environment information:

  1. OS: Ubuntu Server 22.04.3 LTS
  2. Python version: 3.11.7
  3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: deepsparse-nightly 1.7.0.20240103
  4. ML framework version(s): torch 2.1.2
  5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: ONNX 1.14.1
  6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows: Same error so can't output but CPU is AMD EPYC 7282 and the chipset is x86-64-v3 which has AVX2 support. The VM should have access to 32 cores.

To Reproduce Exact steps to reproduce the behavior:

  1. Install Ubuntu Server 22.04.3
  2. Install pyenv and create virtualenv with python version 3.11.3
  3. Activate virtual environment
  4. Install deepsparse via pip install -U deepsparse-nightly[llm]
  5. Use following example code and it with python main.py

    from deepsparse import TextGeneration

    construct a pipeline

    model_path = "zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized" pipeline = TextGeneration(model=model_path)

    generate text

    prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is Kubernetes? ### Response:" output = pipeline(prompt=prompt) print(output.generations[0].text)

Errors

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240103 (b4c5ec70) (release) (optimized) (system=avx2, binary=avx2) Date: 01-14-2024 @ 13:40:59 UTC OS: Linux ubuntu-test-ai 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 Arch: x86_64 CPU: Vendor: Cores/sockets/threads: [0, 0, 0] Available cores/sockets/threads: [0, 0, 0] L1 cache size data/instruction: 0k/0k L2 cache size: 0Mb L3 cache size: 0Mb Total memory: 39.3345G Free memory: 29.7219G Thread: 0x7fe7aca2cb80 Assertion at src/lib/core/topology.cpp:627 Backtrace: 0# 0x00007fe6ad018b5d: [41b90100000031f66a0041b801000000b973020000488d15e19004fee882b190] [01488b3dbb5a9801585ae884b1900148833ddc5598010074084c89e7e852b090] 1# 0x00007fe6ad01908b: [0f1f4400004883c3184839dd741c8b0385c074f1488b7c24284889dee8f4f0ff] [ff4883c3184839dd75e44881c4880000004c89ef5b5d415c415d415e415fe9f2] 2# (deepsparse)

Additional context The system runs as a container on a Proxmox server. I also tried on a Debian 12 system before and it has the same problem so maybe the problem has to do with proxmox or the CPU maybe.

mgoin commented 2 months ago

Hey @Zorgosto that error is due to our hardware topology detection being unable to detect the cache size from CPUID. We are unable to diagnose this ourselves since we do not have access to a virtual environment like that. Please re-open this issue if you have an idea of a publicly available instance with this software, thanks!