intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.23k stars 257 forks source link

How to set the pruned weight blocks as a same learnable value? #1361

Open hobbitlzy opened 1 year ago

hobbitlzy commented 1 year ago

I want to use the sparsity feature of the neural-compressor. I want to prune the model weights using block-wise granularity. Unlike traditional pruning approaches that zero out pruned weights, I aim to set the values within pruned blocks to a same learnable value. Does neural-compressor supports this functionality? If not, is there a convenient workaround to achieve this block-pruning with reparameterization functionality?

YIYANGCAI commented 1 year ago

Hi! Sorry for the late response. I think the method you proposed seems not supported in our current API. However, I believe your idea is interesting, and I will look for a chance to discuss it within my team. By the way, if you have some references which can support the effectiveness of your idea, please feel free to contact us! Thank you.