def export_model(
model: nn.Module,
example_inputs: Tuple[Any, ...],
file_name: str = "CadenceDemoModel",
):
# Quantizer
quantizer = CadenceQuantizer()
# Export
model_exp = capture_pre_autograd_graph(model, example_inputs)
# Prepare
prepared_model = prepare_pt2e(model_exp, quantizer)
prepared_model(*example_inputs)
# Convert
converted_model = convert_pt2e(prepared_model)
# pyre-fixme[16]: Pyre doesn't get that CadenceQuantizer has a patterns attribute
patterns = [q.pattern for q in quantizer.quantizers]
QuantFusion(patterns)(converted_model)
After quantization with QuantFusion, is there a way to perform inference? Currently, during inference, the following issues occur:
RuntimeError: Could not run 'cadence::quantized_conv' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'cadence::quantized_conv' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
Versions
PyTorch version: 2.5.0.dev20240901
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.5 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.29.6
Libc version: N/A
Python version: 3.10.0 (default, Mar 3 2022, 03:54:28) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-14.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
🐛 Describe the bug
https://github.com/pytorch/executorch/blob/main/examples/cadence/operators/quantized_conv1d_op.py
After quantization with QuantFusion, is there a way to perform inference? Currently, during inference, the following issues occur:
Versions
PyTorch version: 2.5.0.dev20240901 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 14.5 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.29.6 Libc version: N/A
Python version: 3.10.0 (default, Mar 3 2022, 03:54:28) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-14.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M3
Versions of relevant libraries: [pip3] executorch==0.4.0a0+99fbca3 [pip3] numpy==1.21.3 [pip3] torch==2.5.0.dev20240901 [pip3] torchaudio==2.5.0.dev20240901 [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20240901 [conda] executorch 0.4.0a0+99fbca3 pypi_0 pypi [conda] numpy 1.21.3 pypi_0 pypi [conda] torch 2.5.0.dev20240901 pypi_0 pypi [conda] torchaudio 2.5.0.dev20240901 pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20240901 pypi_0 pypi