tucan9389 / TFLiteSwift-Vision

Apache License 2.0
38 stars 1 forks source link

The reason why am I making TFLiteSwift-Vision #3

Open tucan9389 opened 3 years ago

tucan9389 commented 3 years ago

Goal

Make a vision-specific layer that you can use on a TFLiteSwift application. You can use pre-implemented vision-specific functions like following:

picture 1. Data flow during using TFLiteSwift-Vision
Screen Shot 2021-08-23 at 11 24 33 PM

I don't know this implementation can merge into tensorflow/tflite-support or tensorflow/examples. But I'll maintain this framework for my personal needs first, and then I'll check this repo can be used or merged into the tensorflow's repos.

Motivation

There are a lot of general methods for image preprocessing when you use them in vision problems, and I would like other iOS developers can use the methods without the need to implement the general methods. In TFLiteSwift-Vision, I made implementation for abstracting and generalizing the image pre-processing as a first step, after then I'm going to make image post-processing and post-processing examples of task-specific cases. So I expect other researchers and developers can use the TFLite without re-implementation of the functions and achieve goals faster.

Why TFLiteSwift-Vision instead of MLKit?

picture 2. supporting tasks of MLkit's custom model
(captured at 21.08.23 from here)
Screen Shot 2021-08-23 at 11 13 32 PM

You can consider domain specific features in MLKit, but those are supporting image classification and object detection now and you cannot use them when you want to implement other tasks like segmentation, pose estimation, style transfer, etc. (If there are other methods that I don't know, please comment!)

picture 3. The architecture of MLKit and CoreML
image

As you can see on the right side of picture 3, Apple supports the image pre/post-processing layer through Vision framework. I expect TFLiteSwift-Vision will be able to be a similar role, I want to support not only tensorflow model's pre/post-processing, but also the tflite model which is converted from pytorch.

picture 4. TFLiteSwift-Vision's position in iOS TFLite architecture
image

How about tflite's the task-library?

As you can see in the picture 3, TFLite officially supports task-library which is the bunch of implementation of various domains pre/post-processing. But it was made by c language (ref), it could be a huddle for most iOS developers who are familiar with Swift in the customization aspect. For making more iOS developers leverage vision tflite model, we support the implementation of split things into pre-processing and post-processing parts and make Swift implementation.

In TFLiteSwift-Vision, we mainly support pre-processing part. Because there is a lot of research and applications for vision tasks that can receive the image as an input. But the case of model output is image is limited as GAN like tasks, so image output post-processing feature will be released after 1.0.0 version as a goal. Now I have made the basic feature that the framework returns Tensor.

picture 5. supporting tasks of official TFLite task library
(captured at 21.08.23 from here)
Screen Shot 2021-08-23 at 11 12 42 PM

Feature Works