pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
2.23k stars 372 forks source link

Unable to export to .pte format #7099

Open sixersri opened 3 days ago

sixersri commented 3 days ago

🐛 Describe the bug

I have a fine-tuned TinyLlama/TinyLlama-1.1B-Chat-v1.0 model.

I created the checkpoint file using the following: torch.save(model.state_dict(), "/opt/ml/model/model.pth")

A 4.1GB model.pth file gets created.

I then try to create a .pte file as follows:

python -m examples.models.llama.export_llama \ --checkpoint /home/elxr/projecta/model.pth \ --params /home/elxr/projecta/params.json \ -X --xnnpack-extended-ops -qmode 8da4w \ -d fp16 \ --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001, 128006, 128007]}' \ --embedding-quantize 4,32 \ --output_name="tinyllamachat.pte"

Here is the content of the params.json file: { "dim": 2048, "multiple_of": 64, "n_heads": 32, "n_kv_heads": 4, "n_layers": 22, "norm_eps": 1e-05, "rope_theta": 10000.0, "use_scaled_rope": false, "vocab_size": 32000 }

I get the following error: NotImplementedError: Cannot copy out of meta tensor; no data!

Here is the full stack trace:

INFO:root:Applying quantizers: [] INFO:root:Loading model with checkpoint=/home/elxr/projecta/model.pth, params=/home/elxr/projecta/params.json, use_kv_cache=False, weight_type=WeightType.LLAMA INFO:root:model.to torch.float16 INFO:root:linear: layers.0.attention.wq, in=2048, out=2048 Traceback (most recent call last): File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/elxr/executorch/examples/models/llama/export_llama.py", line 32, in main() # pragma: no cover File "/home/elxr/executorch/examples/models/llama/export_llama.py", line 28, in main export_llama(args) File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 508, in export_llama builder = _export_llama(args) File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 643, in _export_llama builder_exported = _prepare_for_llama_export(args).export() File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 564, in _prepare_for_llama_export .source_transform(_get_source_transforms(args.model, dtype_override, args)) File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 148, in source_transform self.model = transform(self.model) File "/home/elxr/executorch/examples/models/llama/source_transformation/quantize.py", line 103, in quantize ).quantize(model) File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1100, in quantize state_dict = self._create_quantized_state_dict(model) File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1079, in _create_quantized_state_dict cur_state_dict[f"{fqn}.weight"] = weight_int8.to(self.device) NotImplementedError: Cannot copy out of meta tensor; no data!

Versions

PyTorch version: 2.6.0.dev20241112+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: eLxr Linux 12 (aria) (x86_64) GCC version: (Debian 12.2.0-14) 12.2.0 Clang version: Could not collect CMake version: version 3.31.1 Libc version: glibc-2.36

Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-6.1.0-22-amd64-x86_64-with-glibc2.36 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 Stepping: 7 BogoMIPS: 5000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni Hypervisor vendor: KVM Virtualization type: full L1d cache: 64 KiB (2 instances) L1i cache: 64 KiB (2 instances) L2 cache: 2 MiB (2 instances) L3 cache: 35.8 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Vulnerable Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] executorch==0.5.0a0+1c7d94e [pip3] numpy==1.26.4 [pip3] torch==2.6.0.dev20241112+cpu [pip3] torchao==0.7.0+git75d06933 [pip3] torchaudio==2.5.0.dev20241112+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20241112+cpu [conda] executorch 0.5.0a0+1c7d94e pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20241112+cpu pypi_0 pypi [conda] torchao 0.7.0+git75d06933 pypi_0 pypi [conda] torchaudio 2.5.0.dev20241112+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20241112+cpu pypi_0 pypi

wangqiang58 commented 23 hours ago

me too