Closed PengchengWang closed 7 months ago
After using mx6 instead of fp16, it fixed, device message:
Date & time: year 2024 month 03 day 27 - 09:52:05 --- LRT Image Version: LRT.AIC.13.0.1.12.0.88_LRT.AIC.REL
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭Device Selection╮Scroll-Up-Down[ ]
│◉ All-Devices ├──────────────────────────────────────────────────────────────────────────────────────────────────────
│○ DeviceID-0 │╭DeviceID-0 --- Status: Ready --- FW Version: 1.12.0.88──────────────────────────────────────────────╮
╰────────────────╯│╭NSP Used / Total──╮╭NW Loaded / Active╮╭DRAM Used / Total─╮╭DRAM Bandwidth────╮╭NSP Frequency─────╮│
││ 14 / 14││ 1 / 1││8457851 KB / 15728││ 83195216.00 KBps││ 1450.00 Mhz││
│╰──────────────────╯╰──────────────────╯╰──────────────────╯╰──────────────────╯╰──────────────────╯│
│╭DDR Frequency─────╮╭Temperature───────╮╭Power / TDP Cap───╮ │
││ 2133.00 Mhz││ 55.00 C││ 49.00 W / 75.00 W│ │
│╰──────────────────╯╰──────────────────╯╰──────────────────╯ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
Hi, The DL2q instance has the standard SKU which has 16GB of onboard DRAM per card. Can you please send your 12 digit AWS account to this email id: qualcomm_dl2q_ami@qti.qualcomm.com . We can share a newer AMI that can run the configs you are seeing issues with.
Hi, The DL2q instance has the standard SKU which has 16GB of onboard DRAM per card. Can you please send your 12 digit AWS account to this email id: qualcomm_dl2q_ami@qti.qualcomm.com . We can share a newer AMI that can run the configs you are seeing issues with.
Do you means the Deep Learning Base Qualcomm AMI (Amazon Linux 2) 20240320
version.
I want to reproduce Throughput results in https://www.qualcomm.com/developer/blog/2024/01/qualcomm-cloud-ai-100-accelerates-large-language-model-inference-2x-using-microscaling-mx, have you ever run these models?
ps: It's a bit of expensive in aws, 9$/h /(ㄒoㄒ)/~~.
“Deep Learning Base Qualcomm AMI (Amazon Linux 2) 1.12.0.88P 1.14.0.24A SDK” ami-057a7f5a69e18e465 Above is the AMI i am referring to. This is not a public AMI. If you share your 12-digit AWS account ID with the email address shared in the earlier post, we can share it with you. Yes, on the AWS instance you should be able to get performance close to what has been published. Note that the SKU at AWS is the standard SKU. Noted on the cost concern, its primarily driven by the fact the instance has 8 AI 100 cards.
I followed
cloud-ai-sdk/models/language_processing/decoder/LlamaForCausalLM/README.md
with single SoC, and afterbash compileModel_single_soc.sh Amber-kv fp16 14
, run model got errorspython runModel.py --model-name LLM360/Amber --qpc qpc/Amber-kv-256pl-2048cl-14c-1BS --device_id 0 --prompt "Tell me a joke"
got error as follows:The device message output by
qaic-util -t 1 -d 0
is:DRAM Used/Total: 251164 KB / 157286
is this the key point? what is the unit of 157286 here?The AMI used is "Deep Learning Base Qualcomm AMI (Amazon Linux 2) 20240314"