mrousavy / react-native-vision-camera

📸 A powerful, high-performance React Native Camera library.
https://react-native-vision-camera.com
MIT License
6.64k stars 983 forks source link

Enable output image rotation for frame processor #2807

Closed ismaelsousa closed 2 weeks ago

ismaelsousa commented 3 weeks ago

What

This PR enables the output image rotation in the frame processor

Problem:

Solution: android docs

This image represents how the bounding boxes are right now

image

Output Image Rotation Disabled Output Image Rotation Enabled

Changes

Tested on

Related issues

Discussion

Discord chat

Big thanks to @pedrol2b for debugging it with me 🚀

vercel[bot] commented 3 weeks ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
react-native-vision-camera ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 25, 2024 9:56pm
pedrol2b commented 3 weeks ago
Platform Model Manufacturer Android Version
Android Galaxy A54 Samsung Android 14

Screenshot

Output Image Rotation Disabled Output Image Rotation Enabled
Screenshot from 2024-04-25 21-08-09 Screenshot from 2024-04-25 21-11-25

Video

Output Image Rotation Disabled https://github.com/mrousavy/react-native-vision-camera/assets/107975184/2d15070b-31cf-4625-b2c6-9a2a64460f79

Output Image Rotation Enabled https://github.com/mrousavy/react-native-vision-camera/assets/107975184/df32bf31-70c2-4fe5-9005-caa0bbd0cd74

[!NOTE]
Same code, only modify https://github.com/mrousavy/react-native-vision-camera/pull/2807/commits/81a1399df79f74314e54640df4c692d45f6131b5

mrousavy commented 2 weeks ago

Hey - so first of all, thank you so much for your contribution, I really appreciate it. Especially thanks because of your detailed explanation, and screenshots with before & after.

But; this is something that's not as easy as everybody thinks it is.

I created an issue to track the Orientation feature request (https://github.com/mrousavy/react-native-vision-camera/issues/1891) where I also explained how Orientation works in a Camera.

Essentially you have to think about it this way; The Camera hardware sensor creates buffers that have a specific dimension. E.g. 4096x2160. Those buffers are filled with the light from the sensor, and the hardware is capable of either streaming that into a Hardware Encoder (for recording videos), a high-resolution still image buffer (photo capture) or a CPU-accessible image buffer (frame processors).

The Camera hardware knows nothing about phone orientation, and frankly it shouldn't care - because what happens if you record a video in portrait, then rotate the phone to landscape? You don't expect the video orientation to change from portrait to landscape, as that simply doesn't work with videos - they are in a specific resolution and cannot suddently change from that in the video player.

So instead, the user is responsible for interpreting how the image buffers are rotated. For a photo, this is easy;

  1. Capture photo
  2. Get orientation of phone (e.g. 'portrait')
  3. Get orientation of camera sensor (e.g. 90deg)
  4. Calculate the target orientation ('portrait') by shifting it by the amount the camera sensor is rotated (in this case, -90 deg)
  5. Save photo as it is, and add target orientation to the EXIF flag

When you open the photo now, the EXIF flag will say something like -90deg, so your image viewer does the transform for you then.

We don't want to rotate the actual image buffer, as this could be well over 15 MB of data raw, or 2-5 MB after encoding to a JPG. Rotating that will slow down capture by a lot, as we have to allocate a second 15 MB raw buffer (or 2-5MB but then we'd have to re-encode twice), and shift every single pixel over.

This is the same for ImageAnalysis here - you might've noticed that simply setting this flag does actually give you what you want, but this greatly slows down the Image capture pipeline as it introduces additional overhead - essentially using more CPU, more RAM, and giving less FPS.

[!NOTE] I haven't benchmarked this so we don't know any exact numbers

So instead of causing this additional overhead/slowdown, I wouldn't rotate buffers. The user is responsible for understanding that the buffers are rotated, and then just always treat any coordinates that he gets from that buffer as potentially rotated. So e.g. if you run an ML model on that buffer, you need to understand that to display any bounding boxes on screen you will need to rotate the coordinates.

This is the industry standard - all popular APIs like MLKit Object/Face/Code-detection APIs take an image and an orientation. They just already rotate the output coordinates for you. If you don't use an API that does it for you, you need to rotate the coordinates yourself.

But rotating the entire image buffer is not what we want.

mrousavy commented 2 weeks ago

Also, @pedrol2b were you in class while filming this? 😄

mrousavy commented 2 weeks ago

Btw., according to the documentation of setOutputRotationEnabled;

Turning this on will add more processing overhead to every image analysis frame. The average processing time is about 10-15ms for 640x480 image on a mid-range device. By default, the rotation is disabled.

..this takes 10-15ms on a 640x480 image. That's really low res. A 4k Frame has 29 times as many pixels as a 640x480 frame, so we could expect 290-435ms (unless CPU vectorization kicks in and parallelizes it) - which would cause the Frame Processor to run as slow as 2-4 FPS instead of the full 30-60 FPS.

Again, it's better to interpret the Frame as a rotated frame, than to actually rotate the image buffer itself.

That's why photo and video files use EXIF tags - it's just much more efficient to stream as it is and then rotate later on when displaying it. (e.g. through view transforms)

ismaelsousa commented 2 weeks ago

cool @mrousavy it makes sense, thank you!

oscar-b commented 2 weeks ago

If you don't use an API that does it for you, you need to rotate the coordinates yourself.

What would be nice would be a built in way to easily get these rotated and scaled coordinates. Especially the scaling seems to behave quite weirdly on Android, where it sometimes needs the scaling ratio between height and sometimes width..

mrousavy commented 2 weeks ago

Yea, we could try something like frame.matrix, which holds the proper scaling/rotation matrix.

I believe that frame.orientation is enough, but I wrote this down as part of the Orientation efforts (https://github.com/mrousavy/react-native-vision-camera/issues/1891)

P0diceps commented 2 days ago

So instead, the user is responsible for interpreting how the image buffers are rotated. For a photo, this is easy;

  1. Capture photo
  2. Get orientation of phone (e.g. 'portrait')
  3. Get orientation of camera sensor (e.g. 90deg)
  4. Calculate the target orientation ('portrait') by shifting it by the amount the camera sensor is rotated (in this case, -90 deg)
  5. Save photo as it is, and add target orientation to the EXIF flag

Hi,

I know that it's very difficult to implement for videos but is it possible or an easy task to implement the info of the orientation inside of the library? I guess that feature would already help a lot of people out there. Is it a quick win or also quite difficult?

Thanks in advance