INT8 Quantization for coreml exports

boazvetter commented 1 year ago

#  bits, mode = (8, 'kmeans_lut') if int8 else (16, 'linear') if half else (32, None)
#     if bits < 32:
#         if MACOS:  # quantization only supported on macOS
#             with warnings.catch_warnings():
#                 warnings.filterwarnings("ignore", category=DeprecationWarning)  # suppress numpy==1.20 float warning
#                 ct_model = ct.models.neural_network.quantization_utils.quantize_weights(ct_model, bits, mode)
#         else:
#             print(f'{prefix} quantization only supported on macOS, skipping...')
#     ct_model.save(f)

Hi, it looks like there used to be INT8 quantization support for converting to coreml before, is this deprecated now? how so?

junmcenroe commented 1 year ago

@boazvetter

I modified the export-coreml-nms.py with INT8 quantization, and uploaded.

ts = torch.jit.trace(export_model.eval(), im, strict=False) # TorchScript model

orig_model = ct.convert(ts, inputs=[ct.ImageType('image', shape=im.shape, scale=1 / 255, bias=[0, 0, 0])])

# quantize
bits, mode = (8, 'kmeans_lut') if int8 else (16, 'linear') if half else (32, None)
if bits < 32:
    if MACOS:  # quantization only supported on macOS
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=DeprecationWarning)  # suppress numpy==1.20 float warning
            orig_model = ct.models.neural_network.quantization_utils.quantize_weights(orig_model, bits, mode)
    else:
        print(f'{prefix} quantization only supported on macOS, skipping...')

But on the Google collaboratory environment which I just tested, I cannot verify this modification portion

CoreML: starting export with coremltools 6.2... Tuple detected at graph output. This will be flattened in the converted model. Converting PyTorch Frontend ==> MIL Ops: 100% 703/705 [00:00<00:00, 3004.14 ops/s] Running MIL Common passes: 100% 40/40 [00:00<00:00, 113.70 passes/s] Running MIL Clean up passes: 100% 11/11 [00:00<00:00, 52.24 passes/s] Translating MIL ==> NeuralNetwork Ops: 100% 810/810 [00:03<00:00, 203.10 ops/s] CoreML: quantization only supported on macOS, skipping... CoreML: export success, saved as /content/mymodel.mlmodel (80.5 MB) CoreML: export success ✅ 30.1s, saved as /content/mymodel.mlmodel (80.5 MB)

If you try on MacOS, this update exporter might work.

boazvetter commented 1 year ago

Thank you kindly, I'll try it on a macOS soon

junmcenroe / YOLOv5-CoreML-Export-with-NMS

INT8 Quantization for coreml exports #1