Open sheetalarkadam opened 2 days ago
Can you add try adding import executorch.kernels.quantized
to export_llama.py, like this: https://github.com/pytorch/executorch/blob/84222a9ef1b00663531d68457529b2fcae35df22/kernels/quantized/test/test_out_variants.py#L10
I don't think we have a quantized linear kernel in ExecuTorch outside of XNNPACK or torchao, so I guess using those ops probably dequantizes the weights and does the linear computation in float32, and it might not be a good comparison.
cc @larryliu0820 for missing ops and @digantdesai for XNNPACK
Hmm...We should have quantize_per_token_out
for example in executorch/kernels/quantized/cpu/op_quantize.cpp
. And we should link against the quantized_ops_lib
. And we should have tests for running quantized Llama with portable-ops only, not sure about Llama 3.2 though.
I am follwing the instructions in the Llama2 README to test llama model with Executorch. I want to compare the performance of the model with and without XNNPack. From the code, it seems that DQLinear operations are delegated to XNNPack by default. However, I would like to understand how to use the quantized ops defined in Executorch, as listed in quantized.yaml. Could you provide guidance on configuring the model to use Executorch's quantized ops instead of XNNPack?
I encounter the following error when the -X(--xnnpack) flag is removed from the python export:
raise RuntimeError(f"Missing out variants: {missing_out_vars}") RuntimeError: Missing out variants: {'quantized_decomposed::choose_qparams_per_token_asymmetric', 'quantized_decomposed::dequantize_per_channel', 'quantized_decomposed::dequantize_per_channel_group', 'quantized_decomposed::dequantize_per_token', 'quantized_decomposed::quantize_per_token'}
What adjustments are required to resolve the "missing out variants" error when the -X flag is omitted? Thank you for your assistance!
Versions
Collecting environment information... PyTorch version: 2.6.0.dev20240927+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.31.0 Libc version: glibc-2.35
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.167.1-1.cm2-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: 12.6.77
Versions of relevant libraries: [pip3] executorch==0.5.0a0+20a157f [pip3] numpy==1.26.4 [pip3] torch==2.6.0.dev20240927+cpu [pip3] torchao==0.5.0+git0916b5b2 [pip3] torchaudio==2.5.0.dev20240927+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20240927+cpu [conda] executorch 0.5.0a0+20a157f pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.6.0.dev20240927+cpu pypi_0 pypi [conda] torchaudio 2.5.0.dev20240927+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20240927+cpu pypi_0 pypi