Closed HSANGLEE closed 1 week ago
@HSANGLEE Hi, Have you tested LLaMA 3.2 1B Instruct Model with QNN backend? I got wrong result with QNN backend..........
@enduringstack Actually, I just tried LLaMA 3.2 1B Instruct Model with XNNPACK.
you should remove "--use_sdpa_with_kv_cache" option.
this option maybe removed below commit https://github.com/pytorch/executorch/pull/4188/commits/a72afe2179b161fe08e108285e515e9c97f8a97c
@justin-Kor Thanks for your help. Actually I just confused the option of "--use_sdpa_with_kv_cache" .
if you know, just let me know.
Q1) this option "--use_sdpa_with_kv_cache" is newly updated today. (udpated 12hours ago, this mention is merged https://github.com/pytorch/executorch/commit/2726bdb87efc956298e683aca4f6ddd0039f6030 )
Q2) For spinquant @ LLaMA 3.2B 1B Instruct Model, this instuctionc is also guided. Can I run this comment as you guided? (remove "--use_sdpa_with_kv_cache")
python -m examples.models.llama2.export_llama \ --checkpoint "${LLAMA_QUANTIZED_CHECKPOINT:?}" \ --params "${LLAMA_PARAMS:?}" \ --use_sdpa_with_kv_cache \ -X \ --preq_mode 8da4w_output_8da8w \ --preq_group_size 32 \ --max_seq_length 2048 \ --output_name "llama3_2.pte" \ -kv \ -d fp32 \ --preq_embedding_quantize 8,0 \ --use_spin_quant native \ --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}'
@helunwencser can you please help with this?
I just tried this but unfortunately I cannot reproduce it. It looks like your ExecuTorch install is corrupted somehow. For me, I was able to find the libcustom_ops_aot_lib.so after installing ExecuTorch and export llama 3.2 1B_Instruct model. I am able to see the libcustom_ops_aot_lib.so on my machine.
~/executorch (main)]$ find . | grep libcustom_ops_aot_lib
./pip-out/temp.linux-x86_64-cpython-310/cmake-out/extension/llm/custom_ops/libcustom_ops_aot_lib.so
./pip-out/lib.linux-x86_64-cpython-310/executorch/extension/llm/custom_ops/libcustom_ops_aot_lib.so
Could you please try the following?
./install_requirements.sh --pybind xnnpack
Dear @helunwencser, As your recommanded, I deleted all executorch repo & it's releated environment.
After that "-use-sdpa-with-kv-cache" option works well.
Thanks for your guide.
Glad that it worked!
🐛 Describe the bug
Currently I'm trying to test LLaMA 3.2 1B Instruct Model as you guided. I was done to test LLaMA2 7B / LLaMA2 3 8B with XNNPACK @ On Device side.
I faced some issues during pte generation for LLaMA 3.2 1B Instruct Model.
Could you please guide or hint to solve this issue? (for your information, If I run it excluding '--use_sdpa_with_kv_cache' in the command it generates a pte file normally.
I tried just this command as you guided.
the error logs are as below.
Complie Environment is
Versions
Compile Environment is
/home/hansanglee/.conda/envs/executorch/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Collecting environment information... PyTorch version: 2.6.0.dev20241007+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 17.0.6 (++20231209124227+6009708b4367-1~exp1~20231209124336.77) CMake version: version 3.30.4 Libc version: glibc-2.35
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4080 Nvidia driver version: 561.09 cuDNN version: Probably one of the following: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz CPU family: 6 Model: 158 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 Stepping: 12 BogoMIPS: 7392.02 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 192 KiB (6 instances) L1i cache: 192 KiB (6 instances) L2 cache: 1.5 MiB (6 instances) L3 cache: 9 MiB (1 instance) Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Not affected Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Unknown: Dependent on hypervisor status Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
Versions of relevant libraries: [pip3] executorch==0.5.0a0+986d001 [pip3] numpy==1.26.4 [pip3] onnxruntime-gpu==1.19.2 [pip3] torch==2.6.0.dev20241007+cpu [pip3] torchao==0.5.0+git0916b5b [pip3] torchaudio==2.5.0.dev20241007+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20241007+cpu [conda] executorch 0.5.0a0+986d001 pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20241007+cpu pypi_0 pypi [conda] torchaudio 2.5.0.dev20241007+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20241007+cpu pypi_0 pypi