Open samedii opened 5 days ago
It did not work but maybe the output is correct already and I just need to convert it?
I tried reproducing the flux.1-dev results and I get very similar error sizes:
24-11-21 20:32:31 | D | + x = [min=0.0598, max=1.6406]
24-11-21 20:32:31 | D | + w - AbsMax
24-11-21 20:32:31 | D | + w = [min=0.0981, max=0.4082]
24-11-21 20:32:31 | D | + finished reseting calibrator, ram usage: 14.7
24-11-21 20:32:33 | D | + finished calculating the original outputs, ram usage: 14.7
24-11-21 20:34:44 | D | - x / w range = AbsMax / AbsMax
24-11-21 20:34:44 | D | - alpha = [ 0.0000, 0.0500, 0.1000, 0.1500, 0.2000]
24-11-21 20:34:44 | D | - beta = [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
24-11-21 20:34:44 | D | - sum error = [ 1828.6648, 1724.5615, 1632.4606, 1553.8120, 1493.8046]
24-11-21 20:34:44 | D | - best error = [ 1828.6648, 1724.5615, 1632.4606, 1553.8120, 1493.8046]
24-11-21 20:34:44 | D | - alpha = [ 0.2500, 0.3000, 0.3500, 0.4000, 0.4500]
24-11-21 20:34:44 | D | - beta = [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
24-11-21 20:34:44 | D | - sum error = [ 1439.3716, 1404.2956, 1382.7623, 1370.8294, 1373.6909]
24-11-21 20:34:44 | D | - best error = [ 1439.3716, 1404.2956, 1382.7623, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | - alpha = [ 0.5000, 0.5500, 0.6000, 0.6500, 0.7000]
24-11-21 20:34:44 | D | - beta = [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
24-11-21 20:34:44 | D | - sum error = [ 1380.6241, 1400.4539, 1425.3653, 1461.7289, 1518.9166]
24-11-21 20:34:44 | D | - best error = [ 1370.8294, 1370.8294, 1370.8294, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | - alpha = [ 0.7500, 0.8000, 0.8500, 0.9000, 0.9500]
24-11-21 20:34:44 | D | - beta = [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
24-11-21 20:34:44 | D | - sum error = [ 1575.1668, 1651.5349, 1726.3799, 1817.4276, 1918.2578]
24-11-21 20:34:44 | D | - best error = [ 1370.8294, 1370.8294, 1370.8294, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | - alpha = [ 0.0500, 0.1000, 0.1500, 0.2000, 0.2500]
24-11-21 20:34:44 | D | - beta = [ 0.9500, 0.9000, 0.8500, 0.8000, 0.7500]
24-11-21 20:34:44 | D | - sum error = [ 2283.5662, 2121.8141, 1968.3675, 1835.7584, 1728.6590]
24-11-21 20:34:44 | D | - best error = [ 1370.8294, 1370.8294, 1370.8294, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | - alpha = [ 0.3000, 0.3500, 0.4000, 0.4500, 0.5000]
24-11-21 20:34:44 | D | - beta = [ 0.7000, 0.6500, 0.6000, 0.5500, 0.5000]
24-11-21 20:34:44 | D | - sum error = [ 1634.0025, 1553.8838, 1489.0616, 1451.9831, 1437.2101]
24-11-21 20:34:44 | D | - best error = [ 1370.8294, 1370.8294, 1370.8294, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | - alpha = [ 0.5500, 0.6000, 0.6500, 0.7000, 0.7500]
24-11-21 20:34:44 | D | - beta = [ 0.4500, 0.4000, 0.3500, 0.3000, 0.2500]
24-11-21 20:34:44 | D | - sum error = [ 1425.6216, 1439.6716, 1465.1432, 1497.7786, 1554.3564]
24-11-21 20:34:44 | D | - best error = [ 1370.8294, 1370.8294, 1370.8294, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | - alpha = [ 0.8000, 0.8500, 0.9000, 0.9500]
24-11-21 20:34:44 | D | - beta = [ 0.2000, 0.1500, 0.1000, 0.0500]
24-11-21 20:34:44 | D | - sum error = [ 1625.8374, 1704.7249, 1792.3400, 1906.8528]
24-11-21 20:34:44 | D | - best error = [ 1370.8294, 1370.8294, 1370.8294, 1370.8294]
24-11-21 20:34:44 | D | + error = 1370.8294
24-11-21 20:34:44 | D | + scale = [min=0.3241, max=1.2190]
24-11-21 20:34:44 | D | - transformer_blocks.0.ff.up_proj
24-11-21 20:34:44 | D | + w: sint4
24-11-21 20:34:44 | D | + x: sint4
24-11-21 20:34:44 | D | + y: None
24-11-21 20:34:44 | D | + tensor_type: TensorType.Weights, objective: SearchBasedCalibObjective.OutputsError, granularity: SearchBasedCalibGranularity.Layer
24-11-21 20:34:44 | D | + finished parsing calibration arguments, ram usage: 14.6
24-11-21 20:34:45 | D | + x - AbsMax
24-11-21 20:34:45 | D | + x = [min=0.0312, max=5.2188]
24-11-21 20:34:45 | D | + w - AbsMax
24-11-21 20:34:45 | D | + w = [min=0.0309, max=0.4395]
24-11-21 20:34:45 | D | + finished reseting calibrator, ram usage: 14.7
24-11-21 20:34:58 | D | + finished calculating the original outputs, ram usage: 17.9
Do you have any advice? @lmxyy @synxlin Any help would be greatly appreciated :pray:
I have the same question: how can a quantized checkpoint be converted into a safetensor format model that can be loaded in Nunchaku? hope @lmxyy can provide some assistance.
Hi @samedii and @Howe2018,
As for your question, DeepCompressor dumps floating-point dequantized weight in the checkpoint model.pt
. You can convert the floating-point dequantized weights via standard quantization. scale.pt
contains the quantization scaling information (e.g., searched min/max value during quantization).
We are currently working on a script to convert the checkpoint of DeepCompressor to the Nunchaku format. We'll keep this issue updated and notify you when the conversion script is released.
Let us know if you have any specific requirements or suggestions!
Hi, thanks for sharing your very efficient quantization method!
I was trying it out on a custom flux model and was surprised to see the saved model was the same size as the original bfloat16. I suspect the errors might be large and it decided to keep bfloat16 rather than quantizing.
When I looked in model.pt everything was bfloat16 and the
wgts.pt
file showed this:These are some logs from running quantization:
I'm trying again tonight but I suspect I will see the same issue this time.
Do you have any suggestions?