When debugging the Marlin24 kernels I found that the sparsity structure was not being correctly maintained. The vLLM check for sparsity structure was failing. After this GPTQ fix the problem went away, I pulled this code from the nm-AutoGPTQ codebase
When debugging the Marlin24 kernels I found that the sparsity structure was not being correctly maintained. The vLLM check for sparsity structure was failing. After this GPTQ fix the problem went away, I pulled this code from the nm-AutoGPTQ codebase