Issue compiling llama model

Hi, I am following this instruction to compile a llama model (LLM360/Amber and meta-llama/Llama-2-7b-chat-hf) on an A100 machine

https://github.com/quic/cloud-ai-sdk/tree/1.12/models/language_processing/decoder/LlamaForCausalLM

I encountered the same issue with two models after converting them to onnx and trying to compile them to qaic:

QAIC_ERROR:
Error message:  [Operator-'/model/layers.0/self_attn/ScatterND', opset_version-13, ir_version-7] : Indices and updates must have same lengths!
QAICException:Unable to AddNodesToGraphFromModel

When I use the compileModel.sh I get the following error:

bash compileModel.sh Amber-kv mx6 14
Invalid option -retained-state. Use -h, -help, or --help for list of options.

And because of it I compile it without the -retained-state option. This is my compile command:

/opt/qti-aic/exec/qaic-exec -m=/home/ec2-user/cloud-ai-sdk/models/language_processing/decoder/LlamaForCausalLM/Amber-kv/generatedModels/Amber-kv_fp16.onnx -aic-hw -aic-hw-version=2.0 -network-specialization-config=specializations.json -convert-to-fp16 -aic-num-cores=14 -custom-IO-list-file=Amber-kv/custom_io.yaml -compile-only -aic-binary-dir=qpc/Amber-kv-256pl-2048cl-14c

this is my environment:

Package                  Version
------------------------ ----------
certifi                  2023.11.17
charset-normalizer       3.3.2
cmake                    3.28.1
coloredlogs              15.0.1
filelock                 3.13.1
flatbuffers              23.5.26
fsspec                   2023.12.2
huggingface-hub          0.20.3
humanfriendly            10.0
idna                     3.6
Jinja2                   3.1.3
lit                      17.0.6
markdown-it-py           3.0.0
MarkupSafe               2.1.4
mdurl                    0.1.2
mpmath                   1.3.0
networkx                 3.1
numpy                    1.24.4
nvidia-cublas-cu11       11.10.3.66
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11        8.5.0.96
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu11        10.9.0.58
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu11       10.2.10.91
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu11     11.7.4.91
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu11         2.14.3
nvidia-nccl-cu12         2.18.1
nvidia-nvjitlink-cu12    12.3.101
nvidia-nvtx-cu11         11.7.91
nvidia-nvtx-cu12         12.1.105
onnx                     1.14.0
onnxruntime              1.15.1
onnxsim                  0.4.31
packaging                23.2
pip                      22.0.4
protobuf                 3.20.2
Pygments                 2.17.2
PyYAML                   6.0.1
regex                    2023.12.25
requests                 2.31.0
rich                     13.7.0
safetensors              0.4.2
setuptools               56.0.0
sympy                    1.12
tokenizers               0.13.3
torch                    2.0.1
tqdm                     4.66.1
transformers             4.32.0
triton                   2.0.0
typing_extensions        4.9.0
urllib3                  1.26.6
wheel                    0.42.0

What can be causing this error?

quic / cloud-ai-sdk

Issue compiling llama model #2