Closed angeloskath closed 2 weeks ago
@awni and @jagrit06 I made two changes that are not directly related to the block_sparse_qmm
but general quantization. So I 'd appreciate a review. Also if you think that they should be on a different PR (or maybe just the 2nd change) let me know and I can merge this before these commits and make another PR for this.
The changes are the following:
quantize
and dequantize
now support arbitrary shapes instead of only 2D matricesto_quantized
method to Linear
and Embedding
and changed nn.quantized
to check for modules with the to_quantized
method implemented. This allows arbitrary modules to support quantization using nn.quantize
without MLX knowing anything about them.
The
block_sparse_mm
equivalent for quantized matmuls. A big diff but mostly boilerplate code and refactoring.