microsoft / onnxruntime-extensions

onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
MIT License
340 stars 91 forks source link

Added support for native image decoding #808

Closed skyline75489 closed 2 months ago

skyline75489 commented 2 months ago

This added support for native image decoding on Windows & Apple platforms. This helps us remove libpng & libjpeg completely on these platforms, and in the meantime support more image formats thanks to OS vendors,

skyline75489 commented 2 months ago

To reviewers: I'll be out of office next week. Feel free to modify my branch directly.

skottmckay commented 2 months ago

For pre/post processing we use the DecodeImage and EncodeImage operators. Is there a plan to add native encoding as well?

skyline75489 commented 2 months ago

@skottmckay I'm a bit confused by the coexistence of both ort_extensions::KernelDecodeImage (use opencv) and ImageProcessor::DecodeImage (use native codecs with this PR). Currently ort-genai is only using the image processor with decoding capability and that's why this PR only contains such changes.

I'm not sure if the vision operators currently powered by OpenCV can tolerate the tiny encoding & decoding difference after switching to native codecs. If so, I can probably take some time to unify these implementations.

wenbingl commented 2 months ago

@skottmckay I'm a bit confused by the coexistence of both ort_extensions::KernelDecodeImage (use opencv) and ImageProcessor::DecodeImage (use native codecs with this PR). Currently ort-genai is only using the image processor with decoding capability and that's why this PR only contains such changes.

I'm not sure if the vision operators currently powered by OpenCV can tolerate the tiny encoding & decoding difference after switching to native codecs. If so, I can probably take some time to unify these implementations.

yes, it would be nice that we can unify all of these kernels. The first step now is to remove the OpenCV dependencies for these operator under cv2/ and vision folders.

skottmckay commented 2 months ago

@skottmckay I'm a bit confused by the coexistence of both ort_extensions::KernelDecodeImage (use opencv) and ImageProcessor::DecodeImage (use native codecs with this PR). Currently ort-genai is only using the image processor with decoding capability and that's why this PR only contains such changes. I'm not sure if the vision operators currently powered by OpenCV can tolerate the tiny encoding & decoding difference after switching to native codecs. If so, I can probably take some time to unify these implementations.

yes, it would be nice that we can unify all of these kernels. The first step now is to remove the OpenCV dependencies for these operator under cv2/ and vision folders.

AFAIK we've only ever had accuracy issues from doing a Resize without anti-aliasing, which were resolved when the ONNX Resize added that support.

What's the worst diff you'd get from using the native codecs for decoding? An RGB value being off by one? Not sure anyone would notice that.

skyline75489 commented 1 week ago

Recently I've been validating this PR with the newly released Phi-3.5-vision model and found that the small difference in image decoding doesn't change the model output.

Removing OpenCV dramatically reduced the final binary size in GenAI 0.5 release without compromising the model output. I think now it's safe to say this is the correct direction.

FYI @wenbingl