When I use this config to quantize a YOLOv3 model into fp8:
`version: 1.0
model: # mandatory. used to specify model specific information.
name: yolo_v3
framework: pytorch # mandatory. possible values are tensorflow, mxnet, pytorch, pytorch_ipex, onnxrt_integerops and onnxrt_qlinearops.
quantization:
approach: post_training_static_quant # no need for fp8_e5m2
precision: fp8_e4m3 # allowed precision is fp8_e5m2, fp8_e4m3, fp8_e3m4
calibration:
batchnorm_sampling_size: 3000 # only needed for models w/ BatchNorm
sampling_size: 104
tuning:
accuracy_criterion:
relative: 0.01 # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
exit_policy:
max_trials: 50
timeout: 180 # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
random_seed: 1234 # optional. random seed for deterministic tuning.
`
I got these output:
2024-07-09 17:27:54 [INFO] Save tuning history to /mnt/d/LM/neural-compressor/examples/pytorch/object_detection/yolo_v3/quantization/ptq/eager/nc_workspace/2024-07-09_17-27-50/./history.snapshot. 2024-07-09 17:27:54 [INFO] FP32 baseline is: [Accuracy: 0.7232, Duration (seconds): 3.5848] Error: Invalid scale factor : 1.70e+06, make sure the scale is not larger than : 6.55e+04
When I use this config to quantize a YOLOv3 model into fp8: `version: 1.0
model: # mandatory. used to specify model specific information. name: yolo_v3 framework: pytorch # mandatory. possible values are tensorflow, mxnet, pytorch, pytorch_ipex, onnxrt_integerops and onnxrt_qlinearops.
quantization: approach: post_training_static_quant # no need for fp8_e5m2 precision: fp8_e4m3 # allowed precision is fp8_e5m2, fp8_e4m3, fp8_e3m4 calibration:
batchnorm_sampling_size: 3000 # only needed for models w/ BatchNorm
tuning: accuracy_criterion: relative: 0.01 # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%. exit_policy: max_trials: 50
timeout: 180 # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
random_seed: 1234 # optional. random seed for deterministic tuning. `
I got these output:
2024-07-09 17:27:54 [INFO] Save tuning history to /mnt/d/LM/neural-compressor/examples/pytorch/object_detection/yolo_v3/quantization/ptq/eager/nc_workspace/2024-07-09_17-27-50/./history.snapshot. 2024-07-09 17:27:54 [INFO] FP32 baseline is: [Accuracy: 0.7232, Duration (seconds): 3.5848] Error: Invalid scale factor : 1.70e+06, make sure the scale is not larger than : 6.55e+04
So how can I handle this problem? Thank you!