qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.99k stars 459 forks source link

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear. #275

Open hyx1999 opened 1 year ago

hyx1999 commented 1 year ago

I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.

wbits=4, groupsize=128, without opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 28.715469360351562

wbits=4, groupsize=128, with opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 778.898193359375

    # opt_pack before opt_eval 
    if not args.load and args.wbits < 16 and not args.nearest:
        model = opt_pack(model, quantizers, args.wbits, args.groupsize)

    print("model:", "\n", model)

    if args.eval:
        datasets = ['wikitext2']
        if args.new_eval:
            datasets = ['wikitext2']
        for dataset in datasets:
            dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir)
            print(dataset)
            opt_eval(model, testloader, DEV)
hyx1999 commented 1 year ago

I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.

wbits=4, groupsize=128, without opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 28.715469360351562

wbits=4, groupsize=128, with opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 778.898193359375

    # opt_pack before opt_eval 
    if not args.load and args.wbits < 16 and not args.nearest:
        model = opt_pack(model, quantizers, args.wbits, args.groupsize)

    print("model:", "\n", model)

    if args.eval:
        datasets = ['wikitext2']
        if args.new_eval:
            datasets = ['wikitext2']
        for dataset in datasets:
            dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir)
            print(dataset)
            opt_eval(model, testloader, DEV)

I completed the above test using Facebook/opt 125m