google / flutter-mediapipe

Apache License 2.0
165 stars 7 forks source link

Adds mediapipe_text package #12

Closed craiglabenz closed 4 months ago

craiglabenz commented 9 months ago

Description

Adds the mediapipe_text package.

Design

The latest version of this PR builds on earlier drafts with several longterm-facing considerations, including better native memory management, improved abstractions to better support multiple tasks (e.g., reduced coupling), and an interface split for web-vs-IO implementations.

Better memory management

This PR does not use native finalizers, though they should still be straightforward to add in the future if/when that API is fully ironed out WRT out parameters in finalizer functions.

Objects created in native code

Dart wrappers around native objects typically created in native code now come with two primary constructors - one which accepts a native pointer and a default, unnamed constructor which accepts faked Dart objects. Imagining a native struct with a single integer field named index, Dart objects in this PR now assume the following form:

class Struct {
  Struct({required int index})
    : _index = index,
      _pointer = null;
  Struct.native(this._pointer);

  Pointer<bindings.Struct>? _pointer;

  int? _index;
  int get index => _index ??= _getIndex();
  int _getIndex() {
    if (_pointer == null || pointer.address == nullptr.address) {
      throw Exception('No native memory for Struct.index');
    }
    return _pointer!.ref.index;
  }
}

This implementation allows for surrounding Dart code to collect response objects from MediaPipe functions and only copy their data if it wants to read their values. If the surrounding Dart code does not want to read their values (because the results will merely be passed back to another MediaPipe function) then no memory is copied. Dart classes implemented in this way include Category and Classifications from mediapipe_core, and TextClassifierResult from mediapipe_task_text.

Objects created in Dart code

Objects that originate in Dart code and are passed to native code have a different design. These implement a copyToNative method which allocates the appropriate struct in native memory, copies all values onto that struct, and holds onto the pointer internally for later deallocation.

In practice, it is only the task Options class that follows this pattern, so currently only TextClassifierOptions in mediapipe_task_text exists; though each additional task is likely to add further classes.

Note that MediaPipe's pattern is to have task Options structs consist of inner structs, typically BaseOptions and ClassifierOptions. In Dart, these inner options are designed to know they only ever exist within an outer options class, and as such do not implement copyToNative. Instead, they implement assignToStruct and are passed the struct to which they should copy their values. They do not own that struct and are not responsible for freeing it - that is the job of the outer options class.

Web-vs-IO split

To support eventual web functionality, each package is now divided into two main worlds - io and web.

Shared interface

The common interface each world should implement is defined in an interface folder. This folder is not exported is nearly private. All classes are abstract and class names begin with I to indicate that they are interfaces. However, they are not strictly interfaces in implementation because they contain valid equality overrides and toString methods.

The the mediapipe_core package, the main barrel file looks like this:

// mediapipe_core.dart

export 'universal_mediapipe_core.dart'
    if (dart.library.html) 'src/web/mediapipe_core.dart'
    if (dart.library.io) 'src/io/mediapipe_core.dart';

Again, note that the interface is not exported. Also, note that at this time, src/web/ is empty. There is currently only an implementation within src/io/.

I/O implementation

The I/O implementation contains all the classes outlined in the Better memory management section earlier, as well as the concept of a TaskExecutor.

The TaskExecutor is a utility which pairs with a public-facing task-specific class (e.g., TextClassifier or TextEmbedder (coming soon!)) to actually run the specific task. In practice, this makes the TextClassifier's job to spin up a new isolate which will contain a TextClassifierExecutor instance, then send it messages (text strings) to be classified.

It remains to be seen what abstractions will be needed to have MediaPipe tasks be non-blocking on the web.

Better support for multiple tasks

The design of the abstract class TaskExecutor was ironed out while implementing the "embedText" task (which I currently have completed and stashed in a separate branch). This allows for minimum code duplication between multiple task executors, with individual classes (like TextClassifierExecutor) doing little more than supplying their correct methods from the ffigen-created bindings.

craiglabenz commented 6 months ago

This PR is now stale and redundant.

craiglabenz commented 6 months ago

Oh wait, no it's not.

craiglabenz commented 6 months ago

This should be ready for final 👀 , @Piinks. The very final commit addresses your question here.

charlieforward9 commented 4 months ago

If the surrounding Dart code does not want to read their values (because the results will merely be passed back to another MediaPipe function) then no memory is copied.

Is it possible to do this with a method channel implementation as well?

I am implementing a processing engine in native iOS that includes video capture, inference and cloud storage for better performance and less copying of data to Flutter. I did not consider FFI until I started following this along.

Any help would be greatly appreciated. Thank you.