Block sparse qmm - Githubissues

@awni and @jagrit06 I made two changes that are not directly related to the block_sparse_qmm but general quantization. So I 'd appreciate a review. Also if you think that they should be on a different PR (or maybe just the 2nd change) let me know and I can merge this before these commits and make another PR for this.

The changes are the following:

quantize and dequantize now support arbitrary shapes instead of only 2D matrices
I added a to_quantized method to Linear and Embedding and changed nn.quantized to check for modules with the to_quantized method implemented. This allows arbitrary modules to support quantization using nn.quantize without MLX knowing anything about them.

ml-explore / mlx

Block sparse qmm #1124