Assertion at src/lib/core/topology.cpp:627

Describe the bug When I try to run the example LLM TextGeneration code I get an assertion error. (Sorry for any formatting errors, if you have tips to make it more readable please tell me).

Expected behavior Run the LLM and get the output.

Environment Include all relevant environment information:

OS: Ubuntu Server 22.04.3 LTS
Python version: 3.11.7
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: deepsparse-nightly 1.7.0.20240103
ML framework version(s): torch 2.1.2
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: ONNX 1.14.1
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows: Same error so can't output but CPU is AMD EPYC 7282 and the chipset is x86-64-v3 which has AVX2 support. The VM should have access to 32 cores.

To Reproduce Exact steps to reproduce the behavior:

Install Ubuntu Server 22.04.3
Install pyenv and create virtualenv with python version 3.11.3
Activate virtual environment
Install deepsparse via pip install -U deepsparse-nightly[llm]
Use following example code and it with python main.py

from deepsparse import TextGeneration

construct a pipeline

model_path = "zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized" pipeline = TextGeneration(model=model_path)

generate text

prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is Kubernetes? ### Response:" output = pipeline(prompt=prompt) print(output.generations[0].text)

Errors

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240103 (b4c5ec70) (release) (optimized) (system=avx2, binary=avx2) Date: 01-14-2024 @ 13:40:59 UTC OS: Linux ubuntu-test-ai 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 Arch: x86_64 CPU: Vendor: Cores/sockets/threads: [0, 0, 0] Available cores/sockets/threads: [0, 0, 0] L1 cache size data/instruction: 0k/0k L2 cache size: 0Mb L3 cache size: 0Mb Total memory: 39.3345G Free memory: 29.7219G Thread: 0x7fe7aca2cb80 Assertion at src/lib/core/topology.cpp:627 Backtrace: 0# 0x00007fe6ad018b5d: [41b90100000031f66a0041b801000000b973020000488d15e19004fee882b190] [01488b3dbb5a9801585ae884b1900148833ddc5598010074084c89e7e852b090] 1# 0x00007fe6ad01908b: [0f1f4400004883c3184839dd741c8b0385c074f1488b7c24284889dee8f4f0ff] [ff4883c3184839dd75e44881c4880000004c89ef5b5d415c415d415e415fe9f2] 2# (deepsparse)

Additional context The system runs as a container on a Proxmox server. I also tried on a Debian 12 system before and it has the same problem so maybe the problem has to do with proxmox or the CPU maybe.

neuralmagic / deepsparse

Assertion at src/lib/core/topology.cpp:627 #1527

construct a pipeline

generate text