This is a fairly substantial body of work that will be refined as we bring the model fully online. As of now, it runs and exports properly with the smoothquant int8 strategy and configuration we chose, but the results do not look correct. We will need to go through with a fine tooth comb and validate all of the numerics in conjunction with the quantization simulator that produced the parameters.
Key things added:
Quantizer tensors for expressing the parameters needed in order to quantize from a higher precision to lower precision (tests both integer and fp8).
First pass of teaching Linear and Conv layers how to work with quantized tensors and quantizers.
Teach Linear and Conv layer how to handle high precision input pre multiplier (needed for SmoothQuant like strategies).
Add a simple torch.export option to the interactive punet runner.
Make custom op auto unboxing and auto dequant a regular feature that op implementations can enable with kwargs.
Implement quantizers and QuantizedTensor types for the usual types of per-tensor and per-axis quant schemes.
Add Theta.optional_tensor()
Adds an import_brevitas_dataset.py which imports from a cooked (smoothquant folded) safetensors file and a JSON file containing calibration parameters, producing a quantized punet dataset.
This is a fairly substantial body of work that will be refined as we bring the model fully online. As of now, it runs and exports properly with the smoothquant int8 strategy and configuration we chose, but the results do not look correct. We will need to go through with a fine tooth comb and validate all of the numerics in conjunction with the quantization simulator that produced the parameters.
Key things added: