neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.07k stars 148 forks source link

Fix for Sparsity Persist #2323

Closed Satrat closed 5 months ago

Satrat commented 5 months ago

When debugging the Marlin24 kernels I found that the sparsity structure was not being correctly maintained. The vLLM check for sparsity structure was failing. After this GPTQ fix the problem went away, I pulled this code from the nm-AutoGPTQ codebase

dbogunowicz commented 5 months ago

@Satrat nice find! do you think that this may be something that also confused me over the last few days?