huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.53k stars 454 forks source link

[optimum-onnxruntime] The number of calibration samples must be divisible by (num_calibration_shards * calibration_batch_size) #331

Open fxmarty opened 2 years ago

fxmarty commented 2 years ago

System Info

* pytorch 1.12.0+cu102
* onnxruntime 1.12.0
* optimum 1.4.0.dev0
* onnx-1.12.0
* numpy-1.23.1

Who can help?

No response

Information

Tasks

Reproduction

Dockerfile:

FROM python:3.9

RUN git clone https://github.com/huggingface/optimum.git
RUN pip install optimum/[onnxruntime]
RUN pip install scipy sklearn

CMD python /optimum/examples/onnxruntime/quantization/text-classification/run_glue.py --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english --task_name sst2     --quantization_approach static --calibration_method percentile --do_eval --output_dir /tmp/quantized_distilbert_sst2 --max_eval_samples 100

Then: docker build -f Dockerfile_debug -t debug_sst2 . and: docker run -it debug_sst2

Error:

/usr/local/lib/python3.9/site-packages/transformers/models/distilbert/modeling_distilbert.py:215: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask, torch.tensor(torch.finfo(scores.dtype).min)
Collecting tensor data and making histogram ...
/usr/local/lib/python3.9/site-packages/onnxruntime/quantization/calibrate.py:586: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  data_arr = np.asarray(data_arr)
Traceback (most recent call last):
  File "/optimum/examples/onnxruntime/quantization/text-classification/run_glue.py", line 540, in <module>
    main()
  File "/optimum/examples/onnxruntime/quantization/text-classification/run_glue.py", line 444, in main
    quantizer.partial_fit(
  File "/usr/local/lib/python3.9/site-packages/optimum/onnxruntime/quantization.py", line 266, in partial_fit
    self._calibrator.collect_data(reader)
  File "/usr/local/lib/python3.9/site-packages/onnxruntime/quantization/calibrate.py", line 413, in collect_data
    self.collector.collect(clean_merged_dict)
  File "/usr/local/lib/python3.9/site-packages/onnxruntime/quantization/calibrate.py", line 549, in collect
    return self.collect_value(name_to_arr)
  File "/usr/local/lib/python3.9/site-packages/onnxruntime/quantization/calibrate.py", line 590, in collect_value
    min_value = np.min(data_arr)
  File "<__array_function__ internals>", line 180, in amin
  File "/usr/local/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2918, in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/usr/local/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Expected behavior

No error

fxmarty commented 2 years ago
python run_glue.py --model_name_or_path philschmid/tiny-bert-sst2-distilled --task_name sst2 \
      --quantization_approach static --calibration_method percentile \
      --num_calibration_samples 104 --do_eval \
      --output_dir /tmp/quantized_distilbert_sst2 --max_eval_samples 100

works fine. So apparently we need the number of calibration samples to be a multiple of the calibration_batch_size. I don't think it was the case before?