React Native VisionCamera integration for faster / realtime Camera classification

mrousavy commented 1 year ago

System information

TensorFlow.js version (you are using): x
Are you willing to contribute it (Yes/No): Yes

Hey all! I'm the author of react-native-vision-camera. VisionCamera is a camera library for React Native that not only provides photo and video recording features, but also allows the user to process Camera frames in realtime using clever worker/threading techniques similar to web workers.

The concept is called Frame Processors, and they are simple JavaScript functions that get called for every single frame the camera "sees". For example:

const frameProcessor = useFrameProcessor((frame) => {
  'worklet'
  console.log(`New Frame arrived! ${frame.width} x ${frame.height}`)
}, [])

return <Camera frameProcessor={frameProcessor} />

Since the Frame Processor is a worklet, it is executed fully synchronously on the Camera Thread, so you can do inference or other processing stuff in realtime with pretty much fully native performance. (The entire native -> JS -> native abstraction takes ~1ms to execute)

Since VisionCamera is defacto the standard camera library in react-native apps, I want to provide Tensorflow bindings for VisionCamera to support easy JS-only image classification.

I came up with this API:

function App() {
  const model = useTensorflowModel(require("../assets/face-detection.mlkit"))

  const frameProcessor = useFrameProcessor((frame) => {
    'worklet'
    const output = model.run(frame)
    console.log(JSON.stringify(output))
  }, [])

  return <Camera frameProcessor={frameProcessor} />
}

I am personally not too familiar with the Tensorflow library/bindings, but I believe this can work.

We'd need two native APIs implemented on both iOS and Android, and then exposed to JavaScript:

The useTensorflowModel hook loads the Model from the given resource path.
model.run(..) actually runs the model with the given Camera frame. The Camera frame is a native object (HostObject), so on iOS this is a CMSampleBuffer and on Android this is a ImageProxy or android.media.Image.

As far as I understand tfjs, you use WebGL to do the heavy calculations. In a React Native app, we can easily jump into native code (Objective-C/C++/Java), so I'm not sure if it makes sense to use tfjs/WebGL here - maybe it'd make more sense for me to use the native Tensorflow libraries (Objective-C and Java libs) and then expose them to the JS-side?

Curious to hear your opinions on this, ideally I want to avoid making this fully third-party and out-of-tree though.

Will this change the current api? How?

Not sure to be honest. I think this can be an extra addition, so I guess not?

Who will benefit with this feature?

VisionCamera users. Also, this allows for much more flexibility than with Expo Camera since you can run custom JS code in the Frame Processor, so I'm guessing this could be pretty impactful for the entire camera-on-mobile industry.

It basically allows you to do fully native ML processing but still writing JavaScript code, all in realtime on mobile hardware.

Related issues in VisionCamera:

Any Other info.

Would love to discuss this in detail and help wherever needed if this is something you guys want to support first-class

eledahl commented 1 year ago

+1 @pyu10055. This would be a big win. TFJS RN would benefit a ton from using RNVC. Biggest reason why I switched off TFJS was due to not being able to record video and do the detection. This would likely fix it.

pyu10055 commented 1 year ago

@mrousavy Looks like the library can work with tfjs-react-native, your question is whether you should be providing native ML API wrapper?

mrousavy commented 1 year ago

Yep! I actually got it working with some native C++ bindings - running at up to 800 FPS (so more than enough for realtime camera use-cases) here: https://github.com/mrousavy/react-native-vision-camera/pull/1633

tensorflow / tfjs

React Native VisionCamera integration for faster / realtime Camera classification #7773