quic / cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
https://quic.github.io/cloud-ai-sdk-pages/latest/
Other
55 stars 7 forks source link

Issue compiling llama model #2

Closed MarcelWilnicki closed 10 months ago

MarcelWilnicki commented 10 months ago

Hi, I am following this instruction to compile a llama model (LLM360/Amber and meta-llama/Llama-2-7b-chat-hf) on an A100 machine

https://github.com/quic/cloud-ai-sdk/tree/1.12/models/language_processing/decoder/LlamaForCausalLM

I encountered the same issue with two models after converting them to onnx and trying to compile them to qaic:

QAIC_ERROR:
Error message:  [Operator-'/model/layers.0/self_attn/ScatterND', opset_version-13, ir_version-7] : Indices and updates must have same lengths!
QAICException:Unable to AddNodesToGraphFromModel

When I use the compileModel.sh I get the following error:

bash compileModel.sh Amber-kv mx6 14
Invalid option -retained-state. Use -h, -help, or --help for list of options.

And because of it I compile it without the -retained-state option. This is my compile command:

/opt/qti-aic/exec/qaic-exec -m=/home/ec2-user/cloud-ai-sdk/models/language_processing/decoder/LlamaForCausalLM/Amber-kv/generatedModels/Amber-kv_fp16.onnx -aic-hw -aic-hw-version=2.0 -network-specialization-config=specializations.json -convert-to-fp16 -aic-num-cores=14 -custom-IO-list-file=Amber-kv/custom_io.yaml -compile-only -aic-binary-dir=qpc/Amber-kv-256pl-2048cl-14c

this is my environment:

Package                  Version
------------------------ ----------
certifi                  2023.11.17
charset-normalizer       3.3.2
cmake                    3.28.1
coloredlogs              15.0.1
filelock                 3.13.1
flatbuffers              23.5.26
fsspec                   2023.12.2
huggingface-hub          0.20.3
humanfriendly            10.0
idna                     3.6
Jinja2                   3.1.3
lit                      17.0.6
markdown-it-py           3.0.0
MarkupSafe               2.1.4
mdurl                    0.1.2
mpmath                   1.3.0
networkx                 3.1
numpy                    1.24.4
nvidia-cublas-cu11       11.10.3.66
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11        8.5.0.96
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu11        10.9.0.58
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu11       10.2.10.91
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu11     11.7.4.91
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu11         2.14.3
nvidia-nccl-cu12         2.18.1
nvidia-nvjitlink-cu12    12.3.101
nvidia-nvtx-cu11         11.7.91
nvidia-nvtx-cu12         12.1.105
onnx                     1.14.0
onnxruntime              1.15.1
onnxsim                  0.4.31
packaging                23.2
pip                      22.0.4
protobuf                 3.20.2
Pygments                 2.17.2
PyYAML                   6.0.1
regex                    2023.12.25
requests                 2.31.0
rich                     13.7.0
safetensors              0.4.2
setuptools               56.0.0
sympy                    1.12
tokenizers               0.13.3
torch                    2.0.1
tqdm                     4.66.1
transformers             4.32.0
triton                   2.0.0
typing_extensions        4.9.0
urllib3                  1.26.6
wheel                    0.42.0

What can be causing this error?