Closed skyline75489 closed 2 months ago
To reviewers: I'll be out of office next week. Feel free to modify my branch directly.
For pre/post processing we use the DecodeImage and EncodeImage operators. Is there a plan to add native encoding as well?
@skottmckay I'm a bit confused by the coexistence of both ort_extensions::KernelDecodeImage
(use opencv) and ImageProcessor::DecodeImage
(use native codecs with this PR). Currently ort-genai is only using the image processor with decoding capability and that's why this PR only contains such changes.
I'm not sure if the vision operators currently powered by OpenCV can tolerate the tiny encoding & decoding difference after switching to native codecs. If so, I can probably take some time to unify these implementations.
@skottmckay I'm a bit confused by the coexistence of both
ort_extensions::KernelDecodeImage
(use opencv) andImageProcessor::DecodeImage
(use native codecs with this PR). Currently ort-genai is only using the image processor with decoding capability and that's why this PR only contains such changes.I'm not sure if the vision operators currently powered by OpenCV can tolerate the tiny encoding & decoding difference after switching to native codecs. If so, I can probably take some time to unify these implementations.
yes, it would be nice that we can unify all of these kernels. The first step now is to remove the OpenCV dependencies for these operator under cv2/ and vision folders.
@skottmckay I'm a bit confused by the coexistence of both
ort_extensions::KernelDecodeImage
(use opencv) andImageProcessor::DecodeImage
(use native codecs with this PR). Currently ort-genai is only using the image processor with decoding capability and that's why this PR only contains such changes. I'm not sure if the vision operators currently powered by OpenCV can tolerate the tiny encoding & decoding difference after switching to native codecs. If so, I can probably take some time to unify these implementations.yes, it would be nice that we can unify all of these kernels. The first step now is to remove the OpenCV dependencies for these operator under cv2/ and vision folders.
AFAIK we've only ever had accuracy issues from doing a Resize without anti-aliasing, which were resolved when the ONNX Resize added that support.
What's the worst diff you'd get from using the native codecs for decoding? An RGB value being off by one? Not sure anyone would notice that.
Recently I've been validating this PR with the newly released Phi-3.5-vision model and found that the small difference in image decoding doesn't change the model output.
Removing OpenCV dramatically reduced the final binary size in GenAI 0.5 release without compromising the model output. I think now it's safe to say this is the correct direction.
FYI @wenbingl
This added support for native image decoding on Windows & Apple platforms. This helps us remove libpng & libjpeg completely on these platforms, and in the meantime support more image formats thanks to OS vendors,