qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.98k stars 457 forks source link

AttributeError: 'QuantLinear' object has no attribute 'weight' (t5 branch) (Google/flan-ul2) #268

Closed sigmareaver closed 1 year ago

sigmareaver commented 1 year ago

i7-13700k 128GB RAM RTX 4090

Python = 3.9.10 Transformers = 4.30.0.dev0 PyTorch = 2.0.1 Model = Google/flan-ul2

Quantization command:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python t5.py ../full-models/flan-ul2 wikitext2 --nsamples 256 --wbits 4 --act-order --groupsize 128 --save ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt

I also needed to edit t5_sequential() to run on 24GB of VRAM, but I don't think this should affect the model? The following snippet shows the extent of my changes, except for import gc at the top of the file.

        del layer
        del gptq 
        gc.collect()
        torch.cuda.empty_cache()

        inps, outs = outs, inps

    # do this part on CPU, because GPU runs out of memory
    dev = 'cpu'

    model.encoder.final_layer_norm = model.encoder.final_layer_norm.to(dev)
    model.encoder.dropout = model.encoder.dropout.to(dev)

    encoder_hidden_states = model.encoder.final_layer_norm(inps.cpu())
    encoder_hidden_states = model.encoder.dropout(encoder_hidden_states)

    model.encoder.final_layer_norm = model.encoder.final_layer_norm.cpu()
    model.encoder.dropout = model.encoder.dropout.cpu()

    dev = 'cuda:0'
    encoder_hidden_states = encoder_hidden_states.to(dev)
    inps = inps.to(dev)
    # end of CPU section

Otherwise my 4090 runs out of memory when trying to load model.encoder.final_layer_norm = model.encoder.final_layer_norm.to(dev) to the GPU.

Benchmark command (also applies to t5_inference.py):

python t5.py ../full-models/flan-ul2 wikitext2 --load ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt --wbits 4 --groupsize 128 --benchmark --benchmark_mode mmlu

Yields the following error:

Traceback (most recent call last):
  File "/mnt/Storage/ai-dev/t5-gptq/t5.py", line 752, in <module>
    mmlu_benchmark(model, tokenizer, args)
  File "/mnt/Storage/ai-dev/t5-gptq/t5.py", line 542, in mmlu_benchmark
    cors, acc, probs = mmlu_eval(args, subject, model, tokenizer, dev_df, test_df, (idx,len(subjects)))
  File "~/anaconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/Storage/ai-dev/t5-gptq/t5.py", line 473, in mmlu_eval
    logits = model(
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1683, in forward
    encoder_outputs = self.encoder(
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1090, in forward
    layer_outputs = layer_module(
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 753, in forward
    hidden_states = self.layer[-1](hidden_states)
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 342, in forward
    forwarded_states = self.DenseReluDense(forwarded_states)
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 319, in forward
    isinstance(self.wo.weight, torch.Tensor)
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'QuantLinear' object has no attribute 'weight'

Edit: added snippet showing code modifications, and edited quantization command to show PYTORCH_CUDA_ALLOC_CONF environment variable.

sigmareaver commented 1 year ago

Not sure what I did differently, but it started suggesting qweight now... AttributeError: 'QuantLinear' object has no attribute 'weight'. Did you mean: 'qweight'?

sigmareaver commented 1 year ago

My apologies. It seems a requirement was somehow not installed, or overwritten with your transformers-t5 repo.