Here is the content of the params.json file:
{
"dim": 2048,
"multiple_of": 64,
"n_heads": 32,
"n_kv_heads": 4,
"n_layers": 22,
"norm_eps": 1e-05,
"rope_theta": 10000.0,
"use_scaled_rope": false,
"vocab_size": 32000
}
I get the following error:
NotImplementedError: Cannot copy out of meta tensor; no data!
Here is the full stack trace:
INFO:root:Applying quantizers: []
INFO:root:Loading model with checkpoint=/home/elxr/projecta/model.pth, params=/home/elxr/projecta/params.json, use_kv_cache=False, weight_type=WeightType.LLAMA
INFO:root:model.to torch.float16
INFO:root:linear: layers.0.attention.wq, in=2048, out=2048
Traceback (most recent call last):
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/elxr/executorch/examples/models/llama/export_llama.py", line 32, in
main() # pragma: no cover
File "/home/elxr/executorch/examples/models/llama/export_llama.py", line 28, in main
export_llama(args)
File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 508, in export_llama
builder = _export_llama(args)
File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 643, in _export_llama
builder_exported = _prepare_for_llama_export(args).export()
File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 564, in _prepare_for_llama_export
.source_transform(_get_source_transforms(args.model, dtype_override, args))
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 148, in source_transform
self.model = transform(self.model)
File "/home/elxr/executorch/examples/models/llama/source_transformation/quantize.py", line 103, in quantize
).quantize(model)
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1100, in quantize
state_dict = self._create_quantized_state_dict(model)
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1079, in _create_quantized_state_dict
cur_state_dict[f"{fqn}.weight"] = weight_int8.to(self.device)
NotImplementedError: Cannot copy out of meta tensor; no data!
Versions
PyTorch version: 2.6.0.dev20241112+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: eLxr Linux 12 (aria) (x86_64)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: Could not collect
CMake version: version 3.31.1
Libc version: glibc-2.36
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-6.1.0-22-amd64-x86_64-with-glibc2.36
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 7
BogoMIPS: 5000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 64 KiB (2 instances)
L1i cache: 64 KiB (2 instances)
L2 cache: 2 MiB (2 instances)
L3 cache: 35.8 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Vulnerable
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
🐛 Describe the bug
I have a fine-tuned TinyLlama/TinyLlama-1.1B-Chat-v1.0 model.
I created the checkpoint file using the following: torch.save(model.state_dict(), "/opt/ml/model/model.pth")
A 4.1GB model.pth file gets created.
I then try to create a .pte file as follows:
python -m examples.models.llama.export_llama \ --checkpoint /home/elxr/projecta/model.pth \ --params /home/elxr/projecta/params.json \ -X --xnnpack-extended-ops -qmode 8da4w \ -d fp16 \ --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001, 128006, 128007]}' \ --embedding-quantize 4,32 \ --output_name="tinyllamachat.pte"
Here is the content of the params.json file: { "dim": 2048, "multiple_of": 64, "n_heads": 32, "n_kv_heads": 4, "n_layers": 22, "norm_eps": 1e-05, "rope_theta": 10000.0, "use_scaled_rope": false, "vocab_size": 32000 }
I get the following error: NotImplementedError: Cannot copy out of meta tensor; no data!
Here is the full stack trace:
INFO:root:Applying quantizers: [] INFO:root:Loading model with checkpoint=/home/elxr/projecta/model.pth, params=/home/elxr/projecta/params.json, use_kv_cache=False, weight_type=WeightType.LLAMA INFO:root:model.to torch.float16 INFO:root:linear: layers.0.attention.wq, in=2048, out=2048 Traceback (most recent call last): File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/elxr/executorch/examples/models/llama/export_llama.py", line 32, in
main() # pragma: no cover
File "/home/elxr/executorch/examples/models/llama/export_llama.py", line 28, in main
export_llama(args)
File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 508, in export_llama
builder = _export_llama(args)
File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 643, in _export_llama
builder_exported = _prepare_for_llama_export(args).export()
File "/home/elxr/executorch/examples/models/llama/export_llama_lib.py", line 564, in _prepare_for_llama_export
.source_transform(_get_source_transforms(args.model, dtype_override, args))
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 148, in source_transform
self.model = transform(self.model)
File "/home/elxr/executorch/examples/models/llama/source_transformation/quantize.py", line 103, in quantize
).quantize(model)
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1100, in quantize
state_dict = self._create_quantized_state_dict(model)
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/elxr/miniconda3/envs/executorch/lib/python3.10/site-packages/torchao/quantization/GPTQ.py", line 1079, in _create_quantized_state_dict
cur_state_dict[f"{fqn}.weight"] = weight_int8.to(self.device)
NotImplementedError: Cannot copy out of meta tensor; no data!
Versions
PyTorch version: 2.6.0.dev20241112+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: eLxr Linux 12 (aria) (x86_64) GCC version: (Debian 12.2.0-14) 12.2.0 Clang version: Could not collect CMake version: version 3.31.1 Libc version: glibc-2.36
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-6.1.0-22-amd64-x86_64-with-glibc2.36 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 Stepping: 7 BogoMIPS: 5000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni Hypervisor vendor: KVM Virtualization type: full L1d cache: 64 KiB (2 instances) L1i cache: 64 KiB (2 instances) L2 cache: 2 MiB (2 instances) L3 cache: 35.8 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Vulnerable Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected
Versions of relevant libraries: [pip3] executorch==0.5.0a0+1c7d94e [pip3] numpy==1.26.4 [pip3] torch==2.6.0.dev20241112+cpu [pip3] torchao==0.7.0+git75d06933 [pip3] torchaudio==2.5.0.dev20241112+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20241112+cpu [conda] executorch 0.5.0a0+1c7d94e pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20241112+cpu pypi_0 pypi [conda] torchao 0.7.0+git75d06933 pypi_0 pypi [conda] torchaudio 2.5.0.dev20241112+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20241112+cpu pypi_0 pypi