neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.01k stars 140 forks source link

Fix for Sparsity Persist #2323

Closed Satrat closed 3 weeks ago

Satrat commented 3 weeks ago

When debugging the Marlin24 kernels I found that the sparsity structure was not being correctly maintained. The vLLM check for sparsity structure was failing. After this GPTQ fix the problem went away, I pulled this code from the nm-AutoGPTQ codebase

dbogunowicz commented 3 weeks ago

@Satrat nice find! do you think that this may be something that also confused me over the last few days?