quic / cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
https://quic.github.io/cloud-ai-sdk-pages/latest/
Other
54 stars 6 forks source link

Memory resource exhausted on AWS dl2q #5

Closed PengchengWang closed 7 months ago

PengchengWang commented 7 months ago

I followed cloud-ai-sdk/models/language_processing/decoder/LlamaForCausalLM/README.md with single SoC, and after bash compileModel_single_soc.sh Amber-kv fp16 14, run model got errors python runModel.py --model-name LLM360/Amber --qpc qpc/Amber-kv-256pl-2048cl-14c-1BS --device_id 0 --prompt "Tell me a joke" got error as follows:

ERROR : [09:28:07.820][error][QKmdDevice][activate:#1172] Device 0 activate failed: Memory resource exhausted

ERROR : [09:28:07.820][error][QRuntime][activateNetwork:#333] Dev 0 activate failed status 300

ERROR : [09:28:07.820][error][Aic-0][ProgDev:Id:  0,L: 763] Failed to Activate program

ERROR : [09:28:07.826][error][QKmdDevice][activate:#1172] Device 0 activate failed: Memory resource exhausted

ERROR : [09:28:07.826][error][QRuntime][activateNetwork:#333] Dev 0 activate failed status 300

ERROR : [09:28:07.826][error][Aic-0][ProgDev:Id:  0,L: 763] Failed to Activate program

ERROR : [09:28:07.826][error][Aic-0][Program:Id: 10,L: 698] Failed to get Program Device Activation Handle

ERROR : [09:28:07.826][error][Aic-0][ExecObj:Id:  0,L1828]  Error activating program

QAIC_ERROR_EXECOBJ_RUNTIME [ExecObj:Id:  0,L1828]  Error activating programContextId:0
ERROR : [09:28:07.826][error][LogCommon][qaicCreateExecObj:#1550] Failed to create ExecObj

Traceback (most recent call last):
  File "runModel.py", line 198, in <module>
    main(**vars(args))
  File "runModel.py", line 63, in main
    session = QAICInferenceSession(qpc, device_id, enable_debug_logs=enable_debug_logs)
  File "/disk1/Projects/cloud-ai-sdk/models/language_processing/decoder/LlamaForCausalLM/qaic_infer.py", line 64, in __init__
    self.activate()
  File "/disk1/Projects/cloud-ai-sdk/models/language_processing/decoder/LlamaForCausalLM/qaic_infer.py", line 85, in activate
    self.execObj = qaicrt.ExecObj(self.context, self.program)
RuntimeError: Initialization Exception: in object: ExecObjFailed to create ExecObj

The device message output by qaic-util -t 1 -d 0 is:

Date & time: year 2024 month 03 day 27 - 09:29:48  ---  LRT Image Version: LRT.AIC.13.0.1.12.0.88_LRT.AIC.REL
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭Device Selection╮Scroll-Up-Down[                                                                                      ]
│◉ All-Devices   ├──────────────────────────────────────────────────────────────────────────────────────────────────────
│○ DeviceID-0    │╭DeviceID-0 --- Status: Ready --- FW Version: 1.12.0.88──────────────────────────────────────────────╮
╰────────────────╯│╭NSP Used / Total──╮╭NW Loaded / Active╮╭DRAM Used / Total─╮╭DRAM Bandwidth────╮╭NSP Frequency─────╮│
                  ││            0 / 14││             0 / 0││251164 KB / 157286││    158112.00 KBps││        825.60 Mhz││
                  │╰──────────────────╯╰──────────────────╯╰──────────────────╯╰──────────────────╯╰──────────────────╯│
                  │╭DDR Frequency─────╮╭Temperature───────╮╭Power / TDP Cap───╮                                        │
                  ││       2133.00 Mhz││           20.00 C││ 15.00 W / 75.00 W│                                        │
                  │╰──────────────────╯╰──────────────────╯╰──────────────────╯                                        │
                  ╰────────────────────────────────────────────────────────────────────────────────────────────────────╯

DRAM Used/Total: 251164 KB / 157286 is this the key point? what is the unit of 157286 here?

The AMI used is "Deep Learning Base Qualcomm AMI (Amazon Linux 2) 20240314"

PengchengWang commented 7 months ago

After using mx6 instead of fp16, it fixed, device message:

Date & time: year 2024 month 03 day 27 - 09:52:05  ---  LRT Image Version: LRT.AIC.13.0.1.12.0.88_LRT.AIC.REL
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭Device Selection╮Scroll-Up-Down[                                                                                      ]
│◉ All-Devices   ├──────────────────────────────────────────────────────────────────────────────────────────────────────
│○ DeviceID-0    │╭DeviceID-0 --- Status: Ready --- FW Version: 1.12.0.88──────────────────────────────────────────────╮
╰────────────────╯│╭NSP Used / Total──╮╭NW Loaded / Active╮╭DRAM Used / Total─╮╭DRAM Bandwidth────╮╭NSP Frequency─────╮│
                  ││           14 / 14││             1 / 1││8457851 KB / 15728││  83195216.00 KBps││       1450.00 Mhz││
                  │╰──────────────────╯╰──────────────────╯╰──────────────────╯╰──────────────────╯╰──────────────────╯│
                  │╭DDR Frequency─────╮╭Temperature───────╮╭Power / TDP Cap───╮                                        │
                  ││       2133.00 Mhz││           55.00 C││ 49.00 W / 75.00 W│                                        │
                  │╰──────────────────╯╰──────────────────╯╰──────────────────╯                                        │
                  ╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
quic-aashwins commented 7 months ago

Hi, The DL2q instance has the standard SKU which has 16GB of onboard DRAM per card. Can you please send your 12 digit AWS account to this email id: qualcomm_dl2q_ami@qti.qualcomm.com . We can share a newer AMI that can run the configs you are seeing issues with.

PengchengWang commented 7 months ago

Hi, The DL2q instance has the standard SKU which has 16GB of onboard DRAM per card. Can you please send your 12 digit AWS account to this email id: qualcomm_dl2q_ami@qti.qualcomm.com . We can share a newer AMI that can run the configs you are seeing issues with.

Do you means the Deep Learning Base Qualcomm AMI (Amazon Linux 2) 20240320 version.

I want to reproduce Throughput results in https://www.qualcomm.com/developer/blog/2024/01/qualcomm-cloud-ai-100-accelerates-large-language-model-inference-2x-using-microscaling-mx, have you ever run these models?

ps: It's a bit of expensive in aws, 9$/h /(ㄒoㄒ)/~~.

quic-aashwins commented 7 months ago

“Deep Learning Base Qualcomm AMI (Amazon Linux 2) 1.12.0.88P 1.14.0.24A SDK” ami-057a7f5a69e18e465 Above is the AMI i am referring to. This is not a public AMI. If you share your 12-digit AWS account ID with the email address shared in the earlier post, we can share it with you. Yes, on the AWS instance you should be able to get performance close to what has been published. Note that the SKU at AWS is the standard SKU. Noted on the cost concern, its primarily driven by the fact the instance has 8 AI 100 cards.