Guidance on Implementing Image Classification with MediaPipeUnityPlugin

lvxixi4ever commented 2 months ago

Plugin Version or Commit ID

v0.14.3

Unity Version

2022.3.27f1c1

Your Host OS

Windows 10

Target Platform

UnityEditor

Description

I would like to express my sincere gratitude for providing such a useful framework that allows me to utilize MediaPipe in Unity. I have a question regarding how to implement an image classification task using the MediaPipeUnityPlugin. Taking FaceLandmarkerRunner as an example, would I need to write a script similar to FaceLandmarkerRunner? Could you please provide a general idea or direction (such as which parts need to be overridden or customized)? Thank you very much in advance.

Code to Reproduce the issue

No response

Additional Context

No response

homuler commented 2 months ago

Is it correct that you want to implement the Image Classification Task API?

If you are implementing it yourself, it would be useful to not only read the current source code but also review the Pull Requests from when previous Task APIs were added.

At the very least, you will need to:

Include the internally used Calculator in the library (cf. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-01c2854f2c65078fa2a2d333d98fcbf6647c0d6b15900800ac24b03af04ed92b)
Compile the protobuf for options passed to the Task Runner (cf. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-70333c42487c3bd171af9ed353785c87c250e575f4a71dc5c98d6584ed7dae50)
Implement the Task API by inheriting from Core.BaseVisionTaskApi

I think the output will be similar to the AudioClassifier, so it would be good to look at that as well.

Taking FaceLandmarkerRunner as an example, would I need to write a script similar to FaceLandmarkerRunner?

Essentially, the FaceLandmarker is more important, and the FaceLandmarkerRunner is just running it.

lvxixi4ever commented 2 months ago

Is it correct that you want to implement the Image Classification Task API?

If you are implementing it yourself, it would be useful to not only read the current source code but also review the Pull Requests from when previous Task APIs were added.

At the very least, you will need to:

Include the internally used Calculator in the library (cf. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-01c2854f2c65078fa2a2d333d98fcbf6647c0d6b15900800ac24b03af04ed92b)

Compile the protobuf for options passed to the Task Runner (cf. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-70333c42487c3bd171af9ed353785c87c250e575f4a71dc5c98d6584ed7dae50)

Implement the Task API by inheriting from Core.BaseVisionTaskApi

I think the output will be similar to the AudioClassifier, so it would be good to look at that as well.

Taking FaceLandmarkerRunner as an example, would I need to write a script similar to FaceLandmarkerRunner?

Essentially, the FaceLandmarker is more important, and the FaceLandmarkerRunner is just running it.

Thank you very much for your prompt response. I will try following your suggestions.

lvxixi4ever commented 2 months ago

Is it correct that you want to implement the Image Classification Task API?

If you are implementing it yourself, it would be useful to not only read the current source code but also review the Pull Requests from when previous Task APIs were added.

At the very least, you will need to:

Include the internally used Calculator in the library (cf. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-01c2854f2c65078fa2a2d333d98fcbf6647c0d6b15900800ac24b03af04ed92b)

Compile the protobuf for options passed to the Task Runner (cf. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-70333c42487c3bd171af9ed353785c87c250e575f4a71dc5c98d6584ed7dae50)

Implement the Task API by inheriting from Core.BaseVisionTaskApi

I think the output will be similar to the AudioClassifier, so it would be good to look at that as well.

Taking FaceLandmarkerRunner as an example, would I need to write a script similar to FaceLandmarkerRunner?

Essentially, the FaceLandmarker is more important, and the FaceLandmarkerRunner is just running it.

Hi! In my Unity project, I directly imported the unitypackage file into the Assets folder without downloading the MediaPipeUnityPlugin-all.zip file. I saw in your response that I need to modify the BUILD file in the mediapipe_api directory. Therefore, I extracted MediaPipeUnityPlugin-all.zip, copied the mediapipe_api and third_party folders into my Unity project (in the same directory as Assets), and modified the related BUILD files. Is this the correct approach?

Additionally, I noticed that when you created the FaceLandmarker, you modified the third_party/mediapipe_visibility.diff file. Do I need to modify this file as well? I see entries like index aa839d91..8efc28e1 100644, which seem to be auto-generated.

looking forward to your response. Thank you!

homuler commented 2 months ago

Therefore, I extracted MediaPipeUnityPlugin-all.zip, copied the mediapipe_api and third_party folders into my Unity project (in the same directory as Assets), and modified the related BUILD files. Is this the correct approach?

No. You need to build the library by yourself after modifying the native code (see https://github.com/homuler/MediaPipeUnityPlugin?tab=readme-ov-file#hammer_and_wrench-installation). You don't need to move existing files to another place.

Provided everything is done correctly, running the package workflow after making changes will build the library.

Additionally, I noticed that when you created the FaceLandmarker, you modified the third_party/mediapipe_visibility.diff file. Do I need to modify this file as well?

Maybe yes. This is related to the below comment.

Compile the protobuf for options passed to the Task Runner

Sometimes the visibility of a protobuf file is declared as internal and if you want to compile it, you need to apply a patch (of course, you can compile it by yourself and copy the generated file (i.e. *.cs) manually). See https://github.com/homuler/MediaPipeUnityPlugin/issues/1122 to know how to generate the patch file.

lvxixi4ever commented 2 months ago

Therefore, I extracted MediaPipeUnityPlugin-all.zip, copied the mediapipe_api and third_party folders into my Unity project (in the same directory as Assets), and modified the related BUILD files. Is this the correct approach?

No. You need to build the library by yourself after modifying the native code (see https://github.com/homuler/MediaPipeUnityPlugin?tab=readme-ov-file#hammer_and_wrench-installation). You don't need to move existing files to another place.

Provided everything is done correctly, running the package workflow after making changes will build the library.

Additionally, I noticed that when you created the FaceLandmarker, you modified the third_party/mediapipe_visibility.diff file. Do I need to modify this file as well?

Maybe yes. This is related to the below comment.

Compile the protobuf for options passed to the Task Runner

Sometimes the visibility of a protobuf file is declared as internal and if you want to compile it, you need to apply a patch (of course, you can compile it by yourself and copy the generated file (i.e. *.cs) manually). See #1122 to know how to generate the patch file.

Thank you for your response! I have a few more questions to ask. I have now rewritten ImageClassifier.cs (similar to Facelandmarker.cs), ImageClassifierOptions.cs (similar to FacelandmarkerOptions), ImageClassifierRunner.cs (similar to FacelandmarkerRunner.cs), and ImageClassificationConfig.cs (similar to FacelandmarkerDetectionConfig.cs) by following the landmarker framework. I generated ImageClassifierGraphOptions.cs using the protocol buffer compiler and placed it in the Packages\com.github.homuler.mediapipe\Runtime\Scripts\Protobuf\Tasks\Vision\ImageClassifier\Proto directory.

Now, when running ImageClassifierRunner.cs, I encounter an error at taskApi = ImageClassifier.CreateFromOptions(options): "BadStatusException: NOT_FOUND: ValidatedGraphConfig Initialization failed. No registered object with name: mediapipe::tasks::vision::image_classifier::ImageClassifierGraph; Unable to find Calculator 'mediapipe.tasks.vision.image_classifier.ImageClassifierGraph' [MediaPipeTasksStatus='601']". It seems like the Calculator (Graph) is not loading?

Does this issue relate to the need to create my own library as you mentioned? Should I debug the code first before creating my own library? Thank you!

homuler commented 2 months ago

Does this issue relate to the need to create my own library as you mentioned?

Yes. You need to include the dependent Calculators and build the library. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-01c2854f2c65078fa2a2d333d98fcbf6647c0d6b15900800ac24b03af04ed92b

Should I debug the code first before creating my own library?

Sorry, I didn't understand the intention of this question. If you haven't written the code in C++, I believe you can debug everything in Unity after rebuilding the library.

lvxixi4ever commented 2 months ago

Does this issue relate to the need to create my own library as you mentioned?

Yes. You need to include the dependent Calculators and build the library. https://github.com/homuler/MediaPipeUnityPlugin/pull/992/files#diff-01c2854f2c65078fa2a2d333d98fcbf6647c0d6b15900800ac24b03af04ed92b

Should I debug the code first before creating my own library?

Sorry, I didn't understand the intention of this question. If you haven't written the code in C++, I believe you can debug everything in Unity after rebuilding the library.

Thank you for your response. I will continue to try based on your description. The issue mentioned above might be related to the setting of _TASK_GRAPH_NAME in the classifier. I noticed that in AudioClassifier, _TASK_GRAPH_NAME is set as "mediapipe.tasks.audio.audio_classifier.AudioClassifierGraph". This value should be related to audio_classifiergraph.cc, and it seems to be passed to Calculator.cs (generated by the protocol buffer compiler) during runtime (calculator = pb::ProtoPreconditions. CheckNotNull(value, "value")). In my ImageClassifier.cs, I have already set _TASK_GRAPH_NAME to "mediapipe.tasks.vision.image_classifier.ImageClassifierGraph" according to the final line in image_classifier_graph.cc (::mediapipe::tasks::vision::image_classifier::ImageClassifierGraph). However, I still encounter the error "BadStatusException: NOT_FOUND: ValidatedGraphConfig Initialization failed. No registered object with name: mediapipe::tasks::vision::image_classifier::ImageClassifierGraph; Unable to find Calculator 'mediapipe.tasks.vision.image_classifier. ImageClassifierGraph' [MediaPipeTasksStatus='601']". This might be because I haven't created my own library. I will rebuild the library.

I have an additional question: how does Unity interact with the underlying C++ code? I installed MediaPipeUnityPlugin by directly importing MediaPipeUnityPlugin.unitypackage into the Assets folder. I noticed there is no C++ source code in it (e.g., audio_classifier_graph.cc), meaning there is no local C++ source code. So, how does Unity load the Graph or perform inference while interacting with the C++ code? Thanks！

homuler commented 2 months ago

how does Unity load the Graph or perform inference while interacting with the C++ code?

We're discussing the need to build the library, and this built library is then loaded to call the native code (see https://docs.unity3d.com/Manual/NativePlugins.html). The distributed package does not include the ImageClassifierGraph, so it will not work unless it is rebuilt to include it.

The built library is installed here (on Windows, mediapipe_c.dll will be loaded).

lvxixi4ever commented 2 months ago

how does Unity load the Graph or perform inference while interacting with the C++ code?

We're discussing the need to build the library, and this built library is then loaded to call the native code (see https://docs.unity3d.com/Manual/NativePlugins.html). The distributed package does not include the ImageClassifierGraph, so it will not work unless it is rebuilt to include it.

The built library is installed here (on Windows, mediapipe_c.dll will be loaded).

Hi! I followed your advice and tried to rebuild my own library. However, following the installation guide, I encountered some errors when running python build.py build --desktop cpu -v, and unstable network conditions also caused the rebuild to fail.

Therefore, I tried using the workflow. I forked your project to my GitHub, then downloaded it locally and added my modified code. After that, I pushed my code to GitHub. Finally, I ran the workflow in Release Packages(I didn't modify any content in the yml file.).

Is this the right approach? After this process is completed, will the rebuilt library be automatically updated on my GitHub? Will I be able to download and use it locally then? Thank you!

homuler commented 2 months ago

Is this the right approach?

Yes.

After this process is completed, will the rebuilt library be automatically updated on my GitHub?

No. (In the first place, libraries are not included in the repository)

Will I be able to download and use it locally then?

Yes.

lvxixi4ever commented 2 months ago

Is this the right approach?

Yes.

After this process is completed, will the rebuilt library be automatically updated on my GitHub?

No. (In the first place, libraries are not included in the repository)

Will I be able to download and use it locally then?

Yes.

After downloading package-for-production-src-all, should I directly copy the contents of the Assets and Packages folders into my new Unity project (it seems that I can't directly copy into the Packages directory, only the Assets directory)? Or should I first copy them into the Assets directory and then export it as a .unitypackage file? Thanks!

lvxixi4ever commented 2 months ago

Is this the right approach?

Yes.

After this process is completed, will the rebuilt library be automatically updated on my GitHub?

No. (In the first place, libraries are not included in the repository)

Will I be able to download and use it locally then?

Yes.

After downloading package-for-production-src-all, should I directly copy the contents of the Assets and Packages folders into my new Unity project (it seems that I can't directly copy into the Packages directory, only the Assets directory)? Or should I first copy them into the Assets directory and then export it as a .unitypackage file? Thanks!

I still encountered the following error after following the above steps: No registered object with name: mediapipe::tasks::vision::image_classifier::ImageClassifierGraph; Unable to find Calculator "mediapipe.tasks.vision.image_classifier.ImageClassifierGraph" [MediaPipeTasksStatus='601'] What exactly should this ImageClassifierGraph be? Is it a file? I have already placed the ImageClassifierGraphOptions.cs file in the com.github.homuler.mediapipe\Runtime\Scripts\Protobuf\Tasks\Vision\ImageClassifier\Proto folder.

homuler commented 2 months ago

What exactly should this ImageClassifierGraph be?

https://github.com/google-ai-edge/mediapipe/blob/05f2b43bdcc50744f766b59e70aa4d6e1179246a/mediapipe/tasks/cc/vision/image_classifier/image_classifier_graph.cc#L178

The dependent Calculators must be included in the library (i.e. mediapipe_c.dll). You have to add it to the dependencies when building it as followings: https://github.com/homuler/MediaPipeUnityPlugin/blob/6a7556af6363b970357385d6095ec0b2dcdf3d05/mediapipe_api/BUILD#L330 (the exact target name is @mediapipe//mediapipe/tasks/cc/vision/image_classifier:image_classifier)

Is it a file?

Binary code in the library.

homuler commented 2 months ago

After downloading package-for-production-src-all, should I directly copy the contents of the Assets and Packages folders into my new Unity project (it seems that I can't directly copy into the Packages directory, only the Assets directory)? Or should I first copy them into the Assets directory and then export it as a .unitypackage file?

package-for-production-src-all is the file distributed as MediaPipeUnityPlugin-all-stripped.zip in the release page. So if it's properly built, you can use it as it is, or if you prefer, you can export it as a unitypackage from the Tools/Export unitypackage menu.

However, if you compile some protocol buffer files manually, you may need to copy them individually.

lvxixi4ever commented 2 months ago

What exactly should this ImageClassifierGraph be?

https://github.com/google-ai-edge/mediapipe/blob/05f2b43bdcc50744f766b59e70aa4d6e1179246a/mediapipe/tasks/cc/vision/image_classifier/image_classifier_graph.cc#L178

The dependent Calculators must be included in the library (i.e. mediapipe_c.dll). You have to add it to the dependencies when building it as followings:

https://github.com/homuler/MediaPipeUnityPlugin/blob/6a7556af6363b970357385d6095ec0b2dcdf3d05/mediapipe_api/BUILD#L330

(the exact target name is @mediapipe//mediapipe/tasks/cc/vision/image_classifier:image_classifier)

Is it a file?

Binary code in the library.

Does this mean that before using the workflow to generate package-for-production-src-all, I need to add @mediapipe//mediapipe/tasks/cc/vision/image_classifier in the BUILD file? Should I run the workflow after adding it?

homuler commented 2 months ago

Yes. I think it would be good to add it below the next line. https://github.com/homuler/MediaPipeUnityPlugin/blob/6a7556af6363b970357385d6095ec0b2dcdf3d05/mediapipe_api/BUILD#L283

lvxixi4ever commented 2 months ago

Yes. I think it would be good to add it below the next line.

https://github.com/homuler/MediaPipeUnityPlugin/blob/6a7556af6363b970357385d6095ec0b2dcdf3d05/mediapipe_api/BUILD#L283

Do I only need to modify the one place you mentioned in the BUILD file? I noticed that audio_classifier has several modifications in the BUILD file, such as: config_setting( name = "audio_classification_switch", flag_values = { ":solutions": "audio_classification", }, ) selects.config_setting_group( name = "enable_audio_classification", match_any = [":all_solutions_switch", ":audio_classification_switch"], ) select({ ":enable_audio_classification": [":audio_classification_calculators"], "//conditions:default": [], cc_library( name = "audio_classification_calculators", deps = ["@mediapipe//mediapipe/tasks/cc/audio/audio_classifier:audio_classifier"], ) select({ ":enable_audio_classification": [":audio_classification_assets"], "//conditions:default": [], etc.

Should I modify image_classifier in the BUILD file similarly to the modifications for audio_classifier?

lvxixi4ever commented 2 months ago

Yes. I think it would be good to add it below the next line. https://github.com/homuler/MediaPipeUnityPlugin/blob/6a7556af6363b970357385d6095ec0b2dcdf3d05/mediapipe_api/BUILD#L283

Do I only need to modify the one place you mentioned in the BUILD file? I noticed that audio_classifier has several modifications in the BUILD file, such as: config_setting( name = "audio_classification_switch", flag_values = { ":solutions": "audio_classification", }, ) selects.config_setting_group( name = "enable_audio_classification", match_any = [":all_solutions_switch", ":audio_classification_switch"], ) select({ ":enable_audio_classification": [":audio_classification_calculators"], "//conditions:default": [], cc_library( name = "audio_classification_calculators", deps = ["@mediapipe//mediapipe/tasks/cc/audio/audio_classifier:audio_classifier"], ) select({ ":enable_audio_classification": [":audio_classification_assets"], "//conditions:default": [], etc.

Should I modify image_classifier in the BUILD file similarly to the modifications for audio_classifier?

It seems that according to your suggestion, I just need to add it directly below "@mediapipe//mediapipe/calculators/image ". Thank you very much for your patient answers. I will proceed with this approach. Thanks!

lvxixi4ever commented 2 months ago

Yes. I think it would be good to add it below the next line.

https://github.com/homuler/MediaPipeUnityPlugin/blob/6a7556af6363b970357385d6095ec0b2dcdf3d05/mediapipe_api/BUILD#L283

Hi! I encountered a problem when using my own trained model:

I have a tflite file (converted from an onnx file to a pb file and then to a tflite file). I renamed it to a .bytes file. It accepts input in the shape of [1, channel, height, width]. When using this model directly, I get the following error: BadStatusException: INVALID_ARGUMENT: The input tensor should have dimensions 1 x height x width x depth, where depth = 3 or 4. Got 1 x 3 x 112 x 112. [MediaPipeTasksStatus='601']

It seems like I need to adjust the input dimensions, but I couldn't find code in Unity's C# scripts to directly adjust the dimensions.

Therefore, I re-exported an onnx model with input dimensions [1, height, width, channel]. When I converted this onnx file to tflite and embedded it into the framework, Unity reported the following error: BadStatusException: INVALID_ARGUMENT: The model is not a valid Flatbuffer buffer [MediaPipeTasksStatus='601'].

For this input dimension error, should I modify the C# code or change the input dimensions of the onnx model(in python)? Thanks!

homuler commented 2 months ago

It's likely a model issue, so I think the model needs to be fixed. I can't answer about how to fix it, so please ask the MediaPipe team if you don't know.

lvxixi4ever commented 2 months ago

It's likely a model issue, so I think the model needs to be fixed. I can't answer about how to fix it, so please ask the MediaPipe team if you don't know.

Okay, I'll work on resolving this issue. I really appreciate your taking the time to explain things so patiently!

lvxixi4ever commented 2 months ago

It's likely a model issue, so I think the model needs to be fixed. I can't answer about how to fix it, so please ask the MediaPipe team if you don't know.

Hi! After using the corrected tflite model, I encountered a new problem: BadStatusException: NOT_FOUND: Input tensor has type float32: it requires specifying NormalizationOptions metadata to preprocess input images. [MediaPipeTasksStatus='601']

I noticed that your code (Facelandmark) doesn't seem to define this NormalizationOptions. Should this NormalizationOptions be custom-defined? Additionally, I observed that the modelPath for audio_classifier is yamnet_audioclassifierwith_metadata.bytes. Is there any special setting when generating tflite file? Is it related to my error? Thanks!

homuler commented 2 months ago

How about using the model linked here? https://ai.google.dev/edge/mediapipe/solutions/vision/image_classifier#efficientnet-lite0_model_recommended

Is there any special setting when generating tflite file?

Why is this happening?

Does your file follow the format used by the Task API? Again, if you don't know how to create a file in the correct format, please ask the MediaPipe team.

Custom models used with MediaPipe must be in TensorFlow Lite format and must include specific metadata describing the operating parameters of the model. You should consider using Model Maker to modify the provided models for this task before building your own.

lvxixi4ever commented 2 months ago

How about using the model linked here? https://ai.google.dev/edge/mediapipe/solutions/vision/image_classifier#efficientnet-lite0_model_recommended

Is there any special setting when generating tflite file?

Why is this happening?

Does your file follow the format used by the Task API? Again, if you don't know how to create a file in the correct format, please ask the MediaPipe team.

See also https://ai.google.dev/edge/mediapipe/solutions/vision/image_classifier#custom_models

Custom models used with MediaPipe must be in TensorFlow Lite format and must include specific metadata describing the operating parameters of the model. You should consider using Model Maker to modify the provided models for this task before building your own.

I will try according to your suggestion. Thanks!

lvxixi4ever commented 2 months ago

Hi! I encountered an issue after packaging my script into an Android APK file. The coordinate system on the mobile device seems to be inconsistent with the one on the PC. The PC coordinate system needs to be rotated 90° counterclockwise to match the mobile coordinate system. My script is for emotion recognition, where it first detects a face and then sends a screenshot of the face to the image classification model. This coordinate transformation on the mobile device results in the model receiving an image that is not the correct orientation of the face but a rotated one. How should I handle this situation? Thanks!

homuler commented 1 month ago

The sample app flips the input image and passes rotation information to the Task API based on the camera's orientation.

https://github.com/homuler/MediaPipeUnityPlugin/blob/7cfa66677d8baa582a8f74a5a1b3e3c70499dc26/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L61-L64 https://github.com/homuler/MediaPipeUnityPlugin/blob/7cfa66677d8baa582a8f74a5a1b3e3c70499dc26/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L84 https://github.com/homuler/MediaPipeUnityPlugin/blob/7cfa66677d8baa582a8f74a5a1b3e3c70499dc26/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L97

P.S. If you have completed the implementation of the Task API, please close this issue.

lvxixi4ever commented 1 month ago

The sample app flips the input image and passes rotation information to the Task API based on the camera's orientation.

https://github.com/homuler/MediaPipeUnityPlugin/blob/7cfa66677d8baa582a8f74a5a1b3e3c70499dc26/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L61-L64

https://github.com/homuler/MediaPipeUnityPlugin/blob/7cfa66677d8baa582a8f74a5a1b3e3c70499dc26/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L84

https://github.com/homuler/MediaPipeUnityPlugin/blob/7cfa66677d8baa582a8f74a5a1b3e3c70499dc26/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L97

P.S. If you have completed the implementation of the Task API, please close this issue. OK, Thanks!

homuler / MediaPipeUnityPlugin