pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.36k stars 219 forks source link

AOTI filesize regression *.pt2 filesize is bigger than .*so #1365

Open metascroy opened 1 day ago

metascroy commented 1 day ago

🐛 Describe the bug

Exported model for both pt2 and so. pt2 file is 2x larger:

llama31_1bit.pt2 filesize: 3.09GB llama31_1bit.so filesize: 1.55GB

pt2 command:

OMP_NUM_THREADS=6 python torchchat.py export llama3.1 --device cpu --dtype float32 --quantize '{"embedding:wx": {"bitwidth": 1, "groupsize": 32}, "linear:a8wxdq": {"bitwidth": 1, "groupsize": 256, "has_weight_zeros": false}}' --output-aoti-package-path llama31_1bit.pt2

so command:

OMP_NUM_THREADS=6 python torchchat.py export llama3.1 --device cpu --dtype float32 --quantize '{"embedding:wx": {"bitwidth": 1, "groupsize": 32}, "linear:a8wxdq": {"bitwidth": 1, "groupsize": 256, "has_weight_zeros": false}}' --output-dso llama31_1bit.so

Versions

Collecting environment information... PyTorch version: 2.6.0.dev20241007 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.7 (arm64) GCC version: Could not collect Clang version: 16.0.0 (clang-1600.0.26.3) CMake version: version 3.30.5 Libc version: N/A

Python version: 3.10.0 (default, Mar 3 2022, 03:54:28) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-14.7-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] executorch==0.5.0a0+72b3bb3 [pip3] numpy==1.26.4 [pip3] torch==2.6.0.dev20241007 [pip3] torchao==0.5.0 [pip3] torchaudio==2.5.0.dev20241007 [pip3] torchsr==1.0.4 [pip3] torchtune==0.4.0.dev20241010+cpu [pip3] torchvision==0.20.0.dev20241007 [conda] executorch 0.5.0a0+72b3bb3 pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20241007 pypi_0 pypi [conda] torchao 0.5.0 pypi_0 pypi [conda] torchaudio 2.5.0.dev20241007 pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchtune 0.4.0.dev20241010+cpu pypi_0 pypi [conda] torchvision 0.20.0.dev20241007 pypi_0 pypi

Jack-Khuu commented 1 day ago

Yup yup, this is a known issue/bug in pytorch/pytorch

It'll be solved when this lands: https://github.com/pytorch/pytorch/pull/140022