pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.97k stars 326 forks source link

Error from LLaMA 3.2 1B Instruct Model generation (.pte) #5967

Closed HSANGLEE closed 1 week ago

HSANGLEE commented 2 weeks ago

🐛 Describe the bug

Currently I'm trying to test LLaMA 3.2 1B Instruct Model as you guided. I was done to test LLaMA2 7B / LLaMA2 3 8B with XNNPACK @ On Device side.

I faced some issues during pte generation for LLaMA 3.2 1B Instruct Model.

Could you please guide or hint to solve this issue? (for your information, If I run it excluding '--use_sdpa_with_kv_cache' in the command it generates a pte file normally.

I tried just this command as you guided.

python -m examples.models.llama2.export_llama --checkpoint "${LLAMA_CHECKPOINT:?}" --params "${LLAMA_PARAMS:?}" -kv --use_sdpa_with_kv_cache -X -d bf16 --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2.pte"

the error logs are as below.

/home/hansanglee/executorch/examples/models/llama2/model.py:102: FutureWarning: You are usingtorch.loadwithweights_only=False(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_onlywill be flipped toTrue. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals. We recommend you start settingweights_only=Truefor any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True) Traceback (most recent call last): File "/home/hansanglee/executorch/extension/llm/custom_ops/sdpa_with_kv_cache.py", line 22, in <module> op = torch.ops.llama.sdpa_with_kv_cache.default File "/home/hansanglee/.conda/envs/executorch/lib/python3.10/site-packages/torch/_ops.py", line 1232, in __getattr__ raise AttributeError( AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/hansanglee/.conda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/hansanglee/.conda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/hansanglee/executorch/examples/models/llama2/export_llama.py", line 30, in <module> main() # pragma: no cover File "/home/hansanglee/executorch/examples/models/llama2/export_llama.py", line 26, in main export_llama(modelname, args) File "/home/hansanglee/executorch/examples/models/llama2/export_llama_lib.py", line 476, in export_llama builder = _export_llama(modelname, args) File "/home/hansanglee/executorch/examples/models/llama2/export_llama_lib.py", line 575, in _export_llama _prepare_for_llama_export(modelname, args) File "/home/hansanglee/executorch/examples/models/llama2/export_llama_lib.py", line 531, in _prepare_for_llama_export .source_transform(_get_source_transforms(modelname, dtype_override, args)) File "/home/hansanglee/executorch/extension/llm/export/builder.py", line 144, in source_transform self.model = transform(self.model) File "/home/hansanglee/executorch/examples/models/llama2/source_transformation/sdpa.py", line 102, in replace_sdpa_with_custom_op from executorch.extension.llm.custom_ops import sdpa_with_kv_cache # noqa File "/home/hansanglee/executorch/extension/llm/custom_ops/sdpa_with_kv_cache.py", line 28, in <module> assert len(libs) == 1, f"Expected 1 library but got {len(libs)}" AssertionError: Expected 1 library but got 0

Complie Environment is

1) OS : WSL2 Unbuntu 22.04.5 TLS 2) Python 3.10.0 3) Lateset executorch git (Version 0.5.0a0) 4) I tried ./install_requirement.sh & ./install_requirement.sh --pybind xnnpack & examples/models/llama2/install_requirements.sh

Versions

Compile Environment is

/home/hansanglee/.conda/envs/executorch/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Collecting environment information... PyTorch version: 2.6.0.dev20241007+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 17.0.6 (++20231209124227+6009708b4367-1~exp1~20231209124336.77) CMake version: version 3.30.4 Libc version: glibc-2.35

Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4080 Nvidia driver version: 561.09 cuDNN version: Probably one of the following: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz CPU family: 6 Model: 158 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 1 Stepping: 12 BogoMIPS: 7392.02 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 192 KiB (6 instances) L1i cache: 192 KiB (6 instances) L2 cache: 1.5 MiB (6 instances) L3 cache: 9 MiB (1 instance) Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Not affected Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Unknown: Dependent on hypervisor status Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown

Versions of relevant libraries: [pip3] executorch==0.5.0a0+986d001 [pip3] numpy==1.26.4 [pip3] onnxruntime-gpu==1.19.2 [pip3] torch==2.6.0.dev20241007+cpu [pip3] torchao==0.5.0+git0916b5b [pip3] torchaudio==2.5.0.dev20241007+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20241007+cpu [conda] executorch 0.5.0a0+986d001 pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20241007+cpu pypi_0 pypi [conda] torchaudio 2.5.0.dev20241007+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20241007+cpu pypi_0 pypi

enduringstack commented 2 weeks ago

@HSANGLEE Hi, Have you tested LLaMA 3.2 1B Instruct Model with QNN backend? I got wrong result with QNN backend..........

HSANGLEE commented 2 weeks ago

@enduringstack Actually, I just tried LLaMA 3.2 1B Instruct Model with XNNPACK.

justin-Kor commented 2 weeks ago

you should remove "--use_sdpa_with_kv_cache" option.

this option maybe removed below commit https://github.com/pytorch/executorch/pull/4188/commits/a72afe2179b161fe08e108285e515e9c97f8a97c

HSANGLEE commented 2 weeks ago

@justin-Kor Thanks for your help. Actually I just confused the option of "--use_sdpa_with_kv_cache" .

if you know, just let me know.

Q1) this option "--use_sdpa_with_kv_cache" is newly updated today. (udpated 12hours ago, this mention is merged https://github.com/pytorch/executorch/commit/2726bdb87efc956298e683aca4f6ddd0039f6030 )

Q2) For spinquant @ LLaMA 3.2B 1B Instruct Model, this instuctionc is also guided. Can I run this comment as you guided? (remove "--use_sdpa_with_kv_cache")

python -m examples.models.llama2.export_llama \ --checkpoint "${LLAMA_QUANTIZED_CHECKPOINT:?}" \ --params "${LLAMA_PARAMS:?}" \ --use_sdpa_with_kv_cache \ -X \ --preq_mode 8da4w_output_8da8w \ --preq_group_size 32 \ --max_seq_length 2048 \ --output_name "llama3_2.pte" \ -kv \ -d fp32 \ --preq_embedding_quantize 8,0 \ --use_spin_quant native \ --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}'

larryliu0820 commented 1 week ago

@helunwencser can you please help with this?

helunwencser commented 1 week ago

I just tried this but unfortunately I cannot reproduce it. It looks like your ExecuTorch install is corrupted somehow. For me, I was able to find the libcustom_ops_aot_lib.so after installing ExecuTorch and export llama 3.2 1B_Instruct model. I am able to see the libcustom_ops_aot_lib.so on my machine.

~/executorch (main)]$ find . | grep libcustom_ops_aot_lib
./pip-out/temp.linux-x86_64-cpython-310/cmake-out/extension/llm/custom_ops/libcustom_ops_aot_lib.so
./pip-out/lib.linux-x86_64-cpython-310/executorch/extension/llm/custom_ops/libcustom_ops_aot_lib.so

Could you please try the following?

HSANGLEE commented 1 week ago

Dear @helunwencser, As your recommanded, I deleted all executorch repo & it's releated environment.

After that "-use-sdpa-with-kv-cache" option works well.

Thanks for your guide.

helunwencser commented 1 week ago

Glad that it worked!