High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Apache License 2.0

2.99k stars 459 forks source link

# opt_pack before opt_eval if not args.load and args.wbits < 16 and not args.nearest: model = opt_pack(model, quantizers, args.wbits, args.groupsize) print("model:", "\n", model) if args.eval: datasets = ['wikitext2'] if args.new_eval: datasets = ['wikitext2'] for dataset in datasets: dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir) print(dataset) opt_eval(model, testloader, DEV)

I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.

wbits=4, groupsize=128, without opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 28.715469360351562

wbits=4, groupsize=128, with opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 778.898193359375
    # opt_pack before opt_eval 
    if not args.load and args.wbits < 16 and not args.nearest:
        model = opt_pack(model, quantizers, args.wbits, args.groupsize)

    print("model:", "\n", model)

    if args.eval:
        datasets = ['wikitext2']
        if args.new_eval:
            datasets = ['wikitext2']
        for dataset in datasets:
            dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir)
            print(dataset)
            opt_eval(model, testloader, DEV)

I completed the above test using Facebook/opt 125m

qwopqwop200 / GPTQ-for-LLaMa

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear. #275