Size mismatch error occurs when loading models quantized by GPTQ

Hi, thanks for building this wonderful open-source project!

I am using GPTQ to first quantize a llama2-7b-chat-hf model:

python quantize.py --checkpoint_path checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_seq_length 2048

, and it works perfectly. However, when I then using the quantized model for generation:

python generate.py --compile --checkpoint_path checkpoints/meta-llama/Llama-2-7b-chat-hf/model_int4-gptq.g32.pth --prompt "Hello, my name is"

, a size mismatch error occurs:

RuntimeError: Error(s) in loading state_dict for Transformer:
        size mismatch for layers.0.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.0.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.1.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.1.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.2.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.2.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.3.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.3.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.4.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.4.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.5.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.5.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.6.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.6.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.7.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.7.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.8.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.8.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.9.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.9.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.10.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.10.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.11.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.11.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.12.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.12.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.13.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.13.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.14.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.14.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.15.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.15.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.16.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.16.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.17.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.17.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.18.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.18.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.19.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.19.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.20.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.20.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.21.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.21.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.22.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.22.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.23.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.23.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.24.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.24.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.25.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.25.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.26.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.26.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.27.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.27.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.28.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.28.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.29.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.29.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.30.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.30.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).
        size mismatch for layers.31.feed_forward.w2.weight: copying a param with shape torch.Size([512, 88, 32, 4]) from checkpoint, the shape in current model is torch.Size([512, 86, 32, 4]).
        size mismatch for layers.31.feed_forward.w2.scales_and_zeros: copying a param with shape torch.Size([352, 4096, 2]) from checkpoint, the shape in current model is torch.Size([344, 4096, 2]).

Any solutions to fix or possible clues would be appreciated!

pytorch-labs / gpt-fast

Size mismatch error occurs when loading models quantized by GPTQ #88