Open noujaimc opened 2 months ago
@noujaimc, calibrator.collect_data(data_reader) supports to gather statistics information for one part of the data. You can refer an example here: https://github.com/microsoft/onnxruntime-inference-examples/blob/77989cff19f102300e3c4f99b957b55f74daecb4/quantization/object_detection/trt/yolov3/e2e_user_yolov3_example.py#L73
Here’s the updated code using multiple readers. It works, but it takes several hours (more than 8h, for 4000 images) to complete on CPU and CUDA. Is this normal? The issue isn’t with the inference step, but with collecting tensor data and making the histogram.
import os
import argparse
import numpy as np
import onnxruntime
from PIL import Image
from onnxruntime.quantization import CalibrationDataReader, create_calibrator, write_calibration_table, CalibrationMethod
class CalDataReader(CalibrationDataReader):
def __init__(self, calibration_image_folder: str, model_path: str, batch_size: int = 1, start_index: int = 1, end_index: int = 1):
super().__init__()
self.batch_images = []
selected_images = os.listdir(calibration_image_folder)[start_index:end_index]
print(f"Loading image from {start_index} to {end_index} ", selected_images)
images = []
for image_name in selected_images:
img_path = os.path.join(calibration_image_folder, image_name)
try:
image = np.array(Image.open(img_path).convert('RGB')).astype(np.float32) / 255.0
images.append(image)
except Exception as e:
print(f"Error loading image {img_path}: {e}")
self.batch_images = [np.stack(images[i:i + batch_size]) for i in range(0, len(images), batch_size)]
self.enum = iter(self.batch_images)
self.input_name = onnxruntime.InferenceSession(model_path, None).get_inputs()[0].name
def get_next(self):
next_batch = next(self.enum, None)
if next_batch is not None:
return {self.input_name: next_batch}
return None
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--input_model", default="./model-infer.onnx", help="input model")
parser.add_argument("--dataset", default="./cal_images", help="calibration data set")
args = parser.parse_args()
return args
def main():
args = get_args()
input_model_path = args.input_model
calibration_dataset_path = args.dataset
augmented_model_path = input_model_path.replace('.onnx', '.augmented.onnx')
try:
calibrator = create_calibrator(input_model_path, [], augmented_model_path=augmented_model_path, calibrate_method=CalibrationMethod.Entropy)
calibrator.set_execution_providers(["CUDAExecutionProvider", "CPUExecutionProvider"])
total_data_size = len(os.listdir(calibration_dataset_path))
start_index = 0
batch_size = 5
stride = 10
for i in range(0, total_data_size, stride):
start_index = start_index
end_index = start_index + stride
calibrator.collect_data(data_reader = CalDataReader(calibration_dataset_path, input_model_path, batch_size, start_index, end_index))
start_index += stride
new_compute_range = {}
for k, v in calibrator.compute_data().data.items():
v1, v2 = v.range_value
new_compute_range[k] = (float(v1.item()), float(v2.item()))
write_calibration_table(new_compute_range)
print("Quantized model saved.")
except Exception as e:
print("An error occurred:", e)
if __name__ == "__main__":
main()
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Any ideas?
Describe the issue
Hello,
I'm trying to quantize an ONNX model to INT8 using the ONNX Runtime tools provided here. I have about 1,000 images of size 640x640x3 that I'm using for calibration data. However, when running the following script (run.py):
I noticed that the memory consumption keeps increasing with every image data sent to the calibrator in the collect_data method of the Calibrator class. The memory usage grows until the system can no longer allocate more memory. It seems that the calibration process retains all intermediate outputs in memory, which doesn't scale well when working with large or multiple images for quantization.
My goal is to use the quantized model with TensorRT.
Is this the correct approach to quantize the model?
I was able to reproduce the problem using the quantization example provided here. To do so, you simply need to copy the images in the test_images folder multiple times.
Thank you
To reproduce
1) Run pre-processing in command line:
python -m onnxruntime.quantization.preprocess --input model.onnx --output model-infer.onnx --auto_merge
2) Run calibration script:
python run.py --input_mode model.onnx --dataset cal_images
If needed, I can send the model and images.
Urgency
For now, I can only calibrate with a couple images.
Platform
Windows
OS Version
Windows 10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No