intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
132 stars 18 forks source link

question about calib data #138

Open mxjmtxrm opened 1 month ago

mxjmtxrm commented 1 month ago

Hi, I noticed that the calib data is clipped from the origin input_ids whose length is >= args.seqlen. Can the calib data be generated by packing origin input_ids first and then slicing?

wenhuach21 commented 1 month ago

Do you mean concatenate? Yes, that's doable. You could do it yourself and then pass a list of strings or a list of input_ids to the datasets, detailed in the readme of the example. As we need to support multiple datasets and multiple splits, currently the usage is already a little complicated, --dataset "./tmp.json,NeelNanda/pile-10k:train, mbpp:train+validation+test". However, we could add "concat" and "padding" key if you think it is a better way.

However, we haven't tried concatenate yet. We've only experimented with padding, which can achieve similar results. We still prefer the current method as it provides more data and feels more natural.

mxjmtxrm commented 1 month ago

Thanks for your reply. Another question, I met the following error during quantization:

2024-05-29 12:16:46 INFO autoround.py L981: quantizing 48/61, model.layers.47
2024-05-29 12:17:42 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 6.384080 -> iter 21: 5.686959
2024-05-29 12:17:56 INFO autoround.py L981: quantizing 49/61, model.layers.48
2024-05-29 12:18:52 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 9.977055 -> iter 59: 6.221084
2024-05-29 12:19:06 INFO autoround.py L981: quantizing 50/61, model.layers.49
2024-05-29 12:20:02 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 10.435859 -> iter 183: 6.870513
2024-05-29 12:20:15 INFO autoround.py L981: quantizing 51/61, model.layers.50
2024-05-29 12:21:10 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 11.591762 -> iter 84: 7.518860
2024-05-29 12:21:24 INFO autoround.py L981: quantizing 52/61, model.layers.51
2024-05-29 12:22:19 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 14.611036 -> iter 80: 9.161383
2024-05-29 12:22:34 INFO autoround.py L981: quantizing 53/61, model.layers.52
2024-05-29 12:23:30 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 15.548044 -> iter 81: 9.872126
2024-05-29 12:23:44 INFO autoround.py L981: quantizing 54/61, model.layers.53
2024-05-29 12:24:39 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 15.679964 -> iter 92: 11.336390
2024-05-29 12:24:54 INFO autoround.py L981: quantizing 55/61, model.layers.54
2024-05-29 12:25:50 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 21.250614 -> iter 63: 14.900194
2024-05-29 12:26:04 INFO autoround.py L981: quantizing 56/61, model.layers.55
2024-05-29 12:27:00 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 22.136242 -> iter 54: 17.913368
2024-05-29 12:27:14 INFO autoround.py L981: quantizing 57/61, model.layers.56
2024-05-29 12:28:09 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 31.331656 -> iter 76: 24.755016
2024-05-29 12:28:22 INFO autoround.py L981: quantizing 58/61, model.layers.57
2024-05-29 12:29:17 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 47.869656 -> iter 12: 30.432583
2024-05-29 12:29:30 INFO autoround.py L981: quantizing 59/61, model.layers.58
2024-05-29 12:30:26 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 2345.095215 -> iter 109: 1352.989136
2024-05-29 12:30:39 INFO autoround.py L981: quantizing 60/61, model.layers.59
2024-05-29 12:31:35 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 2887.380615 -> iter 185: 2133.542480
2024-05-29 12:31:49 INFO autoround.py L981: quantizing 61/61, model.layers.60
2024-05-29 12:32:45 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: nan -> iter 0: 340282346638528859811704183484516925440.000000
Traceback (most recent call last):
  File "/workspace/main.py", line 190, in <module>
    main()
  File "/workspace/main.py", line 151, in main
    autoround_quantize(
  File "/workspace/quantization/autoround.py", line 77, in autoround_quantize
    model, _ = autoround.quantize()
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 1064, in quantize
    self.qdq_weight_round(
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 991, in qdq_weight_round
    q_input, input_ids = self.quant_block(
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 939, in quant_block
    unwrapper_block(block, best_v, best_min_scale, best_max_scale)
UnboundLocalError: local variable 'best_v' referenced before assignment

When quantizing the deeper layers, the loss gets bigger and bigger. Thus, the gradient exploded. How do we solve this problem?

wenhuach21 commented 1 month ago

May I know your command to reproduce the issue? try to set --disable_quanted_input, and set disable_low_gpu_mem_usage to speed up the tuning if the gpu memory is enough, Even tuning the 2bits we have never met this issue. I will fix the exception to ensure the tuning completes successfully

wenhuach21 commented 1 month ago

if the data is padded, please don't forget to pass the attention mask, however, we have no such interface now, you could only code your own dataloader by following our code.

mxjmtxrm commented 1 month ago

I wrapped autoround as a function:

def autoround_quantize(model_path, quant_path, tokenizer_path, dataset, seqlen, batch_size, torch_dtype=torch.float16):
    from auto_round import AutoRound
    from auto_round.utils import detect_device
    # Load model
    device_str = detect_device("auto")
    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch_dtype, **{"low_cpu_mem_usage": True, "attn_implementation": "flash_attention_2", "trust_remote_code": True})
    model.seqlen = seqlen
    config = AutoConfig.from_pretrained(str(model_path) + "/")
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)

    round = AutoRound
    weight_config = {}
    for n, m in model.named_modules():
        if isinstance(m, torch.nn.Linear):
            if m.weight.shape[0] % 32 != 0 or m.weight.shape[1] % 32 != 0:
                weight_config[n] = {"data_type": "fp"}
                print(
                    f"{n} will not be quantized due to its shape not being divisible by 32, resulting in an exporting issue to autogptq")
    iters = 200
    scale_dtype = "float16" if torch_dtype==torch.float16 else "bfloat16"
    autoround = round(model, tokenizer, 4, QUANT_GROUP_SIZE, sym=False, batch_size=batch_size,
                      dataset=os.path.dirname(dataset), seqlen=seqlen, iters=iters, device=device_str,
                      minmax_lr=2.0/iters, enable_quanted_input=False, n_samples=2000, 
                      scale_dtype=scale_dtype, weight_config=weight_config
                      )
    model, _ = autoround.quantize()
    model = model.to("cpu")
    model.save_pretrained(quant_path)
    model.config.save_pretrained(quant_path)

I used a mixed dataset containing 5 sub-datasets (different tasks and languages). Does this affect the results? I use the original dataloader in your code.

wenhuach21 commented 1 month ago

It's weird that our code could still work for dataset=os.path.dirname(dataset)? dataset should only supportsstr(huggingface dataset name), data loader, list of str, list of input_ids, local json file We have tried mixed 3 datasets from different languages and the tuning process works fine. Another way is just using our default dataset to have a try.
Besides, always using float16 scale as the kernel of autogptq only support this.

wenhuach21 commented 1 month ago

Besides, please check the log to see the real cached samples, if we could not find n_samples with seqlen >=args.seqlen, then a warning should have been dumped.

mxjmtxrm commented 1 month ago

It's weird that our code could still work for dataset=os.path.dirname(dataset)? dataset should only supportsstr(huggingface dataset name), data loader, list of str, list of input_ids, local json file We have tried mixed 3 datasets from different languages and the tuning process works fine. Another way is just using our default dataset to have a try. Besides, always using float16 scale as the kernel of autogptq only support this.

If I set dataset is a local json file, I met the following error:

Traceback (most recent call last):
  File "/workspace/main.py", line 187, in <module>
    main()
  File "/workspace/main.py", line 148, in main
    autoround_quantize(
  File "/workspace/quantization/autoround.py", line 41, in autoround_quantize
    model, _ = autoround.quantize()
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 1053, in quantize
    inputs = self.cache_block_input(block_names[0], self.n_samples)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 719, in cache_block_input
    self.calib(n_samples)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 652, in calib
    self.dataloader = get_dataloader(
  File "/usr/local/lib/python3.10/dist-packages/auto_round/calib_dataset.py", line 84, in get_dataloader
    calib_dataset = load_dataset(dataset_name, split=split)
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2228, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1881, in dataset_module_factory
    raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /workspace/data_ch/data_mixed.jsonl/data_mixed.jsonl.py or any data file in the same directory.

dataset == "/workspace/data_ch/data_mixed.jsonl"

wenhuach21 commented 1 month ago

We apologize for any confusion. Currently, our code only supports .json files. Please refer to the code at https://github.com/intel/auto-round/blob/main/test/test_calib_dataset.py#L27 to generate your file. We are aware of the issue and will work on fixing it soon.

mxjmtxrm commented 1 month ago

I mixed 3 datasets from Chinese. The log is as follows:

2024-05-30 03:13:53 INFO utils.py L570: Using GPU device
start autoround quantization...
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Loading checkpoint shards: 100%|██████████| 62/62 [00:10<00:00,  6.06it/s]
2024-05-30 03:14:09 INFO autoround.py L465: using torch.float16 for quantization tuning
2024-05-30 03:16:53 WARNING autoround.py L697: Insufficient number of samples collected may affect the quantification. Effective samples size:346, Target sample size:2000
2024-05-30 03:16:53 INFO autoround.py L981: quantizing 1/61, model.layers.0
2024-05-30 03:17:49 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.007039 -> iter 160: 0.000198
2024-05-30 03:18:01 INFO autoround.py L981: quantizing 2/61, model.layers.1
2024-05-30 03:18:56 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000467 -> iter 185: 0.000172
2024-05-30 03:19:09 INFO autoround.py L981: quantizing 3/61, model.layers.2
2024-05-30 03:20:04 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000505 -> iter 184: 0.000199
2024-05-30 03:20:17 INFO autoround.py L981: quantizing 4/61, model.layers.3
2024-05-30 03:21:12 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000715 -> iter 184: 0.000289
2024-05-30 03:21:24 INFO autoround.py L981: quantizing 5/61, model.layers.4
2024-05-30 03:22:18 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.043836 -> iter 194: 0.005168
2024-05-30 03:22:29 INFO autoround.py L981: quantizing 6/61, model.layers.5
2024-05-30 03:23:25 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.006060 -> iter 194: 0.003546
2024-05-30 03:23:38 INFO autoround.py L981: quantizing 7/61, model.layers.6
2024-05-30 03:24:32 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.005078 -> iter 194: 0.003433
2024-05-30 03:24:44 INFO autoround.py L981: quantizing 8/61, model.layers.7
2024-05-30 03:25:39 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.005195 -> iter 192: 0.003314
2024-05-30 03:25:52 INFO autoround.py L981: quantizing 9/61, model.layers.8
2024-05-30 03:26:49 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.005311 -> iter 195: 0.003522
2024-05-30 03:27:00 INFO autoround.py L981: quantizing 10/61, model.layers.9
2024-05-30 03:27:56 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.006215 -> iter 194: 0.004022
2024-05-30 03:28:07 INFO autoround.py L981: quantizing 11/61, model.layers.10
2024-05-30 03:29:02 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.007908 -> iter 183: 0.004855
2024-05-30 03:29:14 INFO autoround.py L981: quantizing 12/61, model.layers.11
2024-05-30 03:30:08 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.009636 -> iter 199: 0.005274
2024-05-30 03:30:21 INFO autoround.py L981: quantizing 13/61, model.layers.12
2024-05-30 03:31:15 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.010242 -> iter 189: 0.006805
2024-05-30 03:31:27 INFO autoround.py L981: quantizing 14/61, model.layers.13
2024-05-30 03:32:20 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.012779 -> iter 159: 0.007155
2024-05-30 03:32:33 INFO autoround.py L981: quantizing 15/61, model.layers.14
2024-05-30 03:33:28 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.015496 -> iter 119: 0.009443
2024-05-30 03:33:40 INFO autoround.py L981: quantizing 16/61, model.layers.15
2024-05-30 03:34:36 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.017565 -> iter 146: 0.010447
2024-05-30 03:34:48 INFO autoround.py L981: quantizing 17/61, model.layers.16
2024-05-30 03:35:43 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.018367 -> iter 181: 0.012401
2024-05-30 03:35:55 INFO autoround.py L981: quantizing 18/61, model.layers.17
2024-05-30 03:36:51 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.022233 -> iter 191: 0.014910
2024-05-30 03:37:03 INFO autoround.py L981: quantizing 19/61, model.layers.18
2024-05-30 03:37:57 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.029981 -> iter 180: 0.017886
2024-05-30 03:38:09 INFO autoround.py L981: quantizing 20/61, model.layers.19
2024-05-30 03:39:05 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.038458 -> iter 154: 0.023269
2024-05-30 03:39:18 INFO autoround.py L981: quantizing 21/61, model.layers.20
2024-05-30 03:40:15 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.045847 -> iter 196: 0.027305
2024-05-30 03:40:29 INFO autoround.py L981: quantizing 22/61, model.layers.21
2024-05-30 03:41:23 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.056776 -> iter 146: 0.036311
2024-05-30 03:41:35 INFO autoround.py L981: quantizing 23/61, model.layers.22
2024-05-30 03:42:31 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.074335 -> iter 180: 0.042874
2024-05-30 03:42:44 INFO autoround.py L981: quantizing 24/61, model.layers.23
2024-05-30 03:43:39 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.097025 -> iter 27: 0.051062
2024-05-30 03:43:51 INFO autoround.py L981: quantizing 25/61, model.layers.24
2024-05-30 03:44:47 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.098709 -> iter 184: 0.062223
2024-05-30 03:44:59 INFO autoround.py L981: quantizing 26/61, model.layers.25
2024-05-30 03:45:56 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.123886 -> iter 115: 0.086075
2024-05-30 03:46:08 INFO autoround.py L981: quantizing 27/61, model.layers.26
2024-05-30 03:47:03 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.184206 -> iter 174: 0.094373
2024-05-30 03:47:16 INFO autoround.py L981: quantizing 28/61, model.layers.27
2024-05-30 03:48:11 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.193183 -> iter 80: 0.107857
2024-05-30 03:48:24 INFO autoround.py L981: quantizing 29/61, model.layers.28
2024-05-30 03:49:21 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.250976 -> iter 181: 0.148402
2024-05-30 03:49:34 INFO autoround.py L981: quantizing 30/61, model.layers.29
2024-05-30 03:50:31 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.329901 -> iter 189: 0.205679
2024-05-30 03:50:44 INFO autoround.py L981: quantizing 31/61, model.layers.30
2024-05-30 03:51:40 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.536778 -> iter 81: 0.245719
2024-05-30 03:51:53 INFO autoround.py L981: quantizing 32/61, model.layers.31
2024-05-30 03:52:50 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.738626 -> iter 161: 0.335690
2024-05-30 03:53:04 INFO autoround.py L981: quantizing 33/61, model.layers.32
2024-05-30 03:54:02 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.898081 -> iter 118: 0.417826
2024-05-30 03:54:15 INFO autoround.py L981: quantizing 34/61, model.layers.33
2024-05-30 03:55:12 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.966615 -> iter 106: 0.539975
2024-05-30 03:55:25 INFO autoround.py L981: quantizing 35/61, model.layers.34
2024-05-30 03:56:21 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 1.332617 -> iter 78: 0.704059
2024-05-30 03:56:34 INFO autoround.py L981: quantizing 36/61, model.layers.35
2024-05-30 03:57:30 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 1.495152 -> iter 199: 0.896235
2024-05-30 03:57:42 INFO autoround.py L981: quantizing 37/61, model.layers.36
2024-05-30 03:58:37 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 1.921514 -> iter 149: 0.934444
2024-05-30 03:58:51 INFO autoround.py L981: quantizing 38/61, model.layers.37
2024-05-30 03:59:46 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 1.993358 -> iter 47: 1.046257
2024-05-30 03:59:58 INFO autoround.py L981: quantizing 39/61, model.layers.38
2024-05-30 04:00:54 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 3.034522 -> iter 138: 1.235662
2024-05-30 04:01:07 INFO autoround.py L981: quantizing 40/61, model.layers.39
2024-05-30 04:02:02 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 2.580281 -> iter 149: 1.421089
2024-05-30 04:02:15 INFO autoround.py L981: quantizing 41/61, model.layers.40
2024-05-30 04:03:10 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 3.164423 -> iter 192: 1.855581
2024-05-30 04:03:23 INFO autoround.py L981: quantizing 42/61, model.layers.41
2024-05-30 04:04:18 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 3.355162 -> iter 145: 1.871395
2024-05-30 04:04:33 INFO autoround.py L981: quantizing 43/61, model.layers.42
2024-05-30 04:05:29 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 3.873594 -> iter 177: 2.135222
2024-05-30 04:05:42 INFO autoround.py L981: quantizing 44/61, model.layers.43
2024-05-30 04:06:39 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 4.027653 -> iter 199: 2.145819
2024-05-30 04:06:51 INFO autoround.py L981: quantizing 45/61, model.layers.44
2024-05-30 04:07:47 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 4.991511 -> iter 112: 2.440429
2024-05-30 04:08:00 INFO autoround.py L981: quantizing 46/61, model.layers.45
2024-05-30 04:08:55 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 5.502632 -> iter 17: 2.820642
2024-05-30 04:09:08 INFO autoround.py L981: quantizing 47/61, model.layers.46
2024-05-30 04:10:04 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 5.623791 -> iter 179: 2.700998
2024-05-30 04:10:17 INFO autoround.py L981: quantizing 48/61, model.layers.47
2024-05-30 04:11:14 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 5.089231 -> iter 4: 3.764171
2024-05-30 04:11:26 INFO autoround.py L981: quantizing 49/61, model.layers.48
2024-05-30 04:12:21 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 8.195049 -> iter 134: 4.052330
2024-05-30 04:12:34 INFO autoround.py L981: quantizing 50/61, model.layers.49
2024-05-30 04:13:29 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 6.804636 -> iter 91: 4.790856
2024-05-30 04:13:42 INFO autoround.py L981: quantizing 51/61, model.layers.50
2024-05-30 04:14:38 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 8.568014 -> iter 144: 5.257205
2024-05-30 04:14:52 INFO autoround.py L981: quantizing 52/61, model.layers.51
2024-05-30 04:15:47 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 7.666635 -> iter 71: 5.260865
2024-05-30 04:16:01 INFO autoround.py L981: quantizing 53/61, model.layers.52
2024-05-30 04:16:56 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 11.691656 -> iter 114: 6.037588
2024-05-30 04:17:08 INFO autoround.py L981: quantizing 54/61, model.layers.53
2024-05-30 04:18:05 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 10.721492 -> iter 126: 8.318145
2024-05-30 04:18:18 INFO autoround.py L981: quantizing 55/61, model.layers.54
2024-05-30 04:19:13 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 14.590244 -> iter 109: 9.492881
2024-05-30 04:19:25 INFO autoround.py L981: quantizing 56/61, model.layers.55
2024-05-30 04:20:21 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 23.456951 -> iter 132: 11.193069
2024-05-30 04:20:33 INFO autoround.py L981: quantizing 57/61, model.layers.56
2024-05-30 04:21:29 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 20.077566 -> iter 47: 15.021464
2024-05-30 04:21:41 INFO autoround.py L981: quantizing 58/61, model.layers.57
2024-05-30 04:22:35 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 33.451534 -> iter 102: 17.207821
2024-05-30 04:22:47 INFO autoround.py L981: quantizing 59/61, model.layers.58
2024-05-30 04:23:41 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 1516.394775 -> iter 40: 1002.322937
2024-05-30 04:23:53 INFO autoround.py L981: quantizing 60/61, model.layers.59
2024-05-30 04:24:46 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 2618.160889 -> iter 196: 1754.337280
2024-05-30 04:24:59 INFO autoround.py L981: quantizing 61/61, model.layers.60
2024-05-30 04:25:53 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: nan -> iter 0: 340282346638528859811704183484516925440.000000
Traceback (most recent call last):
  File "/workspace/main.py", line 187, in <module>
    main()
  File "/workspace/main.py", line 148, in main
    autoround_quantize(
  File "/workspace/quantization/autoround.py", line 82, in autoround_quantize
    model, _ = autoround.quantize()
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 1064, in quantize
    self.qdq_weight_round(
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 991, in qdq_weight_round
    q_input, input_ids = self.quant_block(
  File "/usr/local/lib/python3.10/dist-packages/auto_round/autoround.py", line 939, in quant_block
    unwrapper_block(block, best_v, best_min_scale, best_max_scale)
UnboundLocalError: local variable 'best_v' referenced before assignment

seq length is 2048. Can you please help me to see if there are any errors from the log?

wenhuach21 commented 1 month ago

From the log there is no issues, only the real samples in 346 which I think is fine. May I know which model you are working, have you set enable_quanted_input to False. If you are Chinese, you could set an email to wenhua.cheng@intel.com, and we could use wechat

wenhuach21 commented 1 month ago

After discussing , the exception has already been fixed in version 0.2. However, the large loss issue persists, which may be due to the significant variations in the weights of the user's model.

mxjmtxrm commented 1 month ago

Thanks @wenhuach21 . I hope you can support packing dataset.

wenhuach21 commented 1 month ago

Thanks @wenhuach21 . I hope you can support packing dataset.

We will add 'concat' and 'padding' keys for datasets: concat: Concatenate each text first, then split to the sequence length (seqlen). padding: Pad each text to the sequence length (seqlen) if needed.