turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 274 forks source link

Measurement fails, no error message #273

Closed biship closed 2 months ago

biship commented 8 months ago

Win 11 x64, Python 3.11.5, RTX 3090, using latest commit from exllamav2, and venv from text-generation-webui

All files from: https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss

python convert.py -i J:\Code\Other\AI\text-generation-webui\models\NeverSleep_Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss -o g:\ai -nr -om NeverSleep_Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss_measurement.json

(J:\Code\Other\AI\text-generation-webui\installer_files\env) J:\Code\Other\AI\exllamav2>python convert.py -i J:\Code\Other\AI\text-generation-webui\models\NeverSleep_Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss -o g:\ai\in -om g:\ai\out\NeverSleep_Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-b3.json -nr
 -- Beginning new job
 !! Warning: Output directory is not empty: g:\ai\in
 !! Cleaning output directory: g:\ai\in
 -- Input: J:\Code\Other\AI\text-generation-webui\models\NeverSleep_Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss
 -- Output: g:\ai\in
 -- Using default calibration dataset
 -- Measurement will be saved to g:\ai\out\NeverSleep_Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-b3.json
 !! Conversion script will end after measurement pass
 -- Tokenizing samples (measurement)...
 -- FIN Tokenizing samples (measurement)...
 -- Token embeddings (measurement)...
 -- FIN Token embeddings (measurement)...
 -- Measuring quantization impact...
 -- Layer: model.layers.0 (Attention)
 -- model.layers.0.self_attn.q_proj                    0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.self_attn.q_proj                    1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.self_attn.q_proj                    1:4b_64g s4                                        4.06 bpw
 -- model.layers.0.self_attn.q_proj                    1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.self_attn.q_proj                    0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.self_attn.q_proj                    1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.self_attn.q_proj                    1:6b_32g s4                                        6.13 bpw
 -- model.layers.0.self_attn.q_proj                    1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.self_attn.k_proj                    0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.self_attn.k_proj                    1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.self_attn.k_proj                    1:4b_64g s4                                        4.06 bpw
 -- model.layers.0.self_attn.k_proj                    1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.self_attn.k_proj                    0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.self_attn.k_proj                    1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.self_attn.k_proj                    1:6b_32g s4                                        6.13 bpw
 -- model.layers.0.self_attn.k_proj                    1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.self_attn.v_proj                    0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.self_attn.v_proj                    0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.0.self_attn.v_proj                    0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.self_attn.v_proj                    0.1:4b_64g/0.9:3b_64g s4                           3.17 bpw
 -- model.layers.0.self_attn.v_proj                    1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.self_attn.v_proj                    1:4b_64g s4                                        4.06 bpw
 -- model.layers.0.self_attn.v_proj                    1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.self_attn.v_proj                    0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.0.self_attn.v_proj                    0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.self_attn.v_proj                    1:5b_64g s4                                        5.06 bpw
 -- model.layers.0.self_attn.v_proj                    1:5b_32g s4                                        5.13 bpw
 -- model.layers.0.self_attn.v_proj                    1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.self_attn.v_proj                    1:6b_32g s4                                        6.13 bpw
 -- model.layers.0.self_attn.v_proj                    1:8b_32g s4                                        8.13 bpw
 -- model.layers.0.self_attn.v_proj                    1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.self_attn.o_proj                    0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.self_attn.o_proj                    1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.self_attn.o_proj                    1:4b_64g s4                                        4.06 bpw
 -- model.layers.0.self_attn.o_proj                    1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.self_attn.o_proj                    0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.self_attn.o_proj                    1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.self_attn.o_proj                    1:6b_32g s4                                        6.13 bpw
 -- model.layers.0.self_attn.o_proj                    1:8b_128g s4                                       8.03 bpw
 -- 2.1253 bpw  accuracy: 0.87914264
 -- 2.1862 bpw  accuracy: 0.89547628
 -- 2.2706 bpw  accuracy: 0.91153419
 -- 2.6643 bpw  accuracy: 0.93977410
 -- 3.1564 bpw  accuracy: 0.94086534
 -- 3.1580 bpw  accuracy: 0.94406551
 -- 4.0314 bpw  accuracy: 0.95052630
 -- 4.0346 bpw  accuracy: 0.95478612
 -- 4.0691 bpw  accuracy: 0.95854223
 -- 4.1256 bpw  accuracy: 0.95961577
 -- 4.1580 bpw  accuracy: 0.96763796
 -- 4.1777 bpw  accuracy: 0.97054249
 -- 4.2612 bpw  accuracy: 0.96918827
 -- 4.3170 bpw  accuracy: 0.97240692
 -- 5.2439 bpw  accuracy: 0.98059887
 -- 5.3170 bpw  accuracy: 0.98182935
 -- 6.0314 bpw  accuracy: 0.98205972
 -- 6.3256 bpw  accuracy: 0.98560953
 -- 8.0314 bpw  accuracy: 0.98911542
 -- Duration: 10.39 seconds
 -- Layer: model.layers.0 (MoE MLP)
 !! Warning: w2.6 has less than 10% calibration for 2/19 rows
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w1       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w3       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       0.15:8b_128g/0.85:6b_128g s4                       6.33 bpw
 -- model.layers.0.block_sparse_moe.experts.0.w2       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w1       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w3       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       0.15:8b_128g/0.85:6b_128g s4                       6.33 bpw
 -- model.layers.0.block_sparse_moe.experts.1.w2       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w1       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w3       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       0.15:8b_128g/0.85:6b_128g s4                       6.33 bpw
 -- model.layers.0.block_sparse_moe.experts.2.w2       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w1       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w3       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       0.15:8b_128g/0.85:6b_128g s4                       6.33 bpw
 -- model.layers.0.block_sparse_moe.experts.3.w2       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       1:4b_128g s4                                       4.03 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w1       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       1:4b_32g s4                                        4.13 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       1:6b_128g s4                                       6.03 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       0.1:8b_128g/0.9:6b_128g s4                         6.28 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w3       1:8b_128g s4                                       8.03 bpw
 -- model.layers.0.block_sparse_moe.experts.4.w2       0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw

(J:\Code\Other\AI\text-generation-webui\installer_files\env)

It looks as if the measurement phase does not complete and there is no error message or json file created.

turboderp commented 8 months ago

I would suspect it's maybe running out of system RAM, or something else is killing the process maybe?

biship commented 8 months ago

When it dies, the measurement pass consumes 20GB of VRAM (out of 24GB), and about 32GB of RAM (out of 64GB).

laoda513 commented 5 months ago

I would suspect it's maybe running out of system RAM, or something else is killing the process maybe?

to avoid oom, is there a way to use multi gpus like gpu_spilt?

turboderp commented 5 months ago

Sadly no. I've considered ways to speed up quantization, but I've already reduced VRAM requirement as much as I can by swapping everything to system RAM. If it doesn't fit now it's because of individual matrices that are too big to process in the available space, and the method really doesn't parallelize very well.

turboderp commented 2 months ago

Closing this as stale.

As a note, I've added an option to enable fast_safetensors when converting models. It might help in situations where the quantizer is running out of system memory even when there is plenty of system memory available. This is a quirk of how safetensors does memory mapping on Windows, and the fast_safetensors option avoids using that library when loading model weights. It doesn't help when saving tensors, sadly.