Open casper-hansen opened 8 months ago
Could you give me the shape of the qweight for your model? This would help me to judge where the problem is. As for INT4 weight, I need to put two INT4 together into a INT8. Maybe this is not suitable for W8 situation.
Hi @ilur98, thanks for your great work on this repository. I am attempting to modify your work to support W8A8 as I found that static W4A8 represents gives too large of a quantization error.
I am running into some trouble when attempting to modify. The quantization works and saves the model. However, when I try to load it, there are problems in the QuantLinear shapes. Do you have any idea how to fix this so that it can run W8A8 with inference mod?