homuler / MediaPipeUnityPlugin

Unity plugin to run MediaPipe
MIT License
1.74k stars 451 forks source link

poseLandmarksStream sometimes generating an empty packet when observeTimestampBounds = true #642

Closed Kasra-G closed 1 day ago

Kasra-G commented 2 years ago

Plugin Version or Commit ID

v0.10.0

Unity Version

2021.3.3f1

Your Host OS

Windows 10

Target Platform

Windows Standalone

Target Device

No response

[Windows Only] Visual Studio C++ and Windows SDK Version

Visual Studio C++: 14.29.30139.0 Windows SDK: 10.0.17763

[Linux Only] GCC/G++ and GLIBC Version

No response

[Android Only] Android Build Tools and NDK Version

No response

[iOS Only] XCode Version

No response

Build Command

python build.py build --desktop cpu -vv

Bug Description

In the documentation, it says that setting observeTimestampBounds to true will generate an empty packet if mediapipe does not find a detection. However, I noticed that for some streams (poseLandmarkStream in PoseTrackingSolution, there could be more in other solutions), an empty packet is prepended to the detection packet, seemingly at random. This does not happen often, approximately 2-5 times in a 200 image sequence that all produce detections, but it does mean that the empty packets cannot be relied on to determine if a detection failed.

Steps to Reproduce the Bug

  1. Write a simple if statement in PoseTrackingGraph.TryGetNext() that checks when poseLandmark is null and print out all of the values (or set a breakpoint)
  2. run a sequence of 200+ images (that all would produce a detection from Mediapipe) through PoseTrackingSolution in Sync mode
  3. Debug and eventually you will find images that produced null values for poseLandmark (meaning an empty packet was grabbed), but roiFromLandmarks and poseWorldLandmarks will have values.

Switch off observeTimestampBounds on the poseLandmarksStream to false and this no longer happens.

Log

Unfortunately the log file contains sensitive information, so I am not permitted to share.

Screenshot/Video

No response

Additional Context

Workaround

image Adding lines 126 to 129 in PoseTrackingGraph will successfully workaround the issue (for Pose), but it would be better if the empty packets were not generated in the first place

homuler commented 2 years ago

This does not happen often, approximately 2-5 times in a 200 image sequence that all produce detections, but it does mean that the empty packets cannot be relied on to determine if a detection failed.

As I mentioned in #552, I guess that it is caused because the input data is a sequence of static images while the graph is not running in static image mode. However, I don't have data that can reproduce this issue, so I'm not sure. Will you share the input data?

If you can't, then

  1. Does it occur if you set use_prev_landmarks to false?

    • You can set it at runtime like this.
  2. If you don't know how to do that, will you check if it occurs with the official sample code (C++)?

    • If it occurs, this is not a plugin issue, so please report it to the mediapipe repository.
    • note that you need to modify the code and set callbacks.
Kasra-G commented 2 years ago

Hi! Thanks for looking into the issue. Per your first thought on reproducibility, I retested this on a fresh install of the plugin with only the code to detect when the empty packet has occurred. I ran the webcam source in Sync mode and in Non-blocking Sync mode (I have not tested Async mode). Both resulted in the poseLandmark stream eventually giving an empty packet before the correct one. It is interesting because the poseWorldLandmark stream does not do this (this behavior is the basis of the workaround) These are the steps I followed to reproduce the issue consistently:

  1. Fresh install (if not already) of the plugin
  2. Paste the following lines of code between lines 127 and 128 in PoseTrackingGraph
    if (poseLandmarks == null && poseWorldLandmarks != null)
    {
    r2 = TryGetNext(_poseLandmarksStream, out poseLandmarks, allowBlock, currentTimestampMicrosec);
    }
  3. Set a breakpoint at line 130 and attach to the Unity editor
  4. Set the image source to Webcam, however, I do not believe the image source matters
  5. Run the Pose tracking scene, with the running mode set to either Sync or Non-blocking Sync and position yourself in the frame.

Within 10 seconds or so (usually less), the breakpoint should trigger signaling that poseLandmark stream had erroneously sent an empty packet. I tried to determine if there were any poses that could cause the issue to occur more frequently but as far as I can tell it is totally random.

As for your suggestion of turning the use_prev_landmarks setting to false, maybe I will leave that to someone else... it looks like I would need another weekend or two of figuring out more mediapipe graphs to get that working.

I will take a look and see about testing it on the official C++ sample code, but since I did end up discovering a workaround, it has suddenly taken a lower priority compared to my other tasks. When I have gotten around to it and determined if it happens there, I'll notify. For now, I thought it best to address some of your points so that hopefully you or anyone else could reproduce the issue.

homuler commented 2 years ago

After digging into this issue, I suspect that the following is occurring (some Calculator names are normalized).

Calculation Steps

NOTE:

Step A

  1. An input image is added to input_video at X.
  2. poselandmarkgpu__ImagePropertiesCalculator adds its output to image_size at X.
  3. The new image_size packet triggers poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator.
  4. PoseLandmarkByRoiGpu adds its output to unfiltered_pose_landmarks and unfiltered_auxiliary_landmarks.
    • In practice, outputs are not added to unfiltered_pose_landmarks and unfiltered_auxiliary_landmarks simultaneously, but the order does not matter here.

Step B

  1. The new unfiltered_pose_landmarks packet triggers poselandmarkgpu__poselandmarkfiltering__switchcontainer_1__SwitchDemuxCalculator
  2. poselandmarkgpu__poselandmarkfiltering__switchcontainer_1__SwitchDemuxCalculator triggers poselandmarkgpu__poselandmarkfiltering__switchcontainer_1__VisibilitySmoothingCalculator_2.
  3. poselandmarkgpu__poselandmarkfiltering__switchcontainer_1__VisibilitySmoothingCalculator_2 adds its output to filtered_visibility.
  4. The new filtered_visibility packet triggers poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator.
  5. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator relays filtered_visibility to poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2.

Step C

  1. The new unfiltered_auxiliary_landmarks packet triggers poselandmarkgpu__poselandmarkfiltering__LandmarksToDetectionCalculator
  2. poselandmarkgpu__poselandmarkfiltering__LandmarksToDetectionCalculator adds its output to aux_detection.
  3. The new aux_detection packet triggers poselandmarkgpu__poselandmarkfiltering__AlignmentPointsRectsCalculator.
  4. poselandmarkgpu__poselandmarkfiltering__AlignmentPointsRectsCalculator adds its output to roi.
  5. The new roi packet triggers poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator.
  6. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator relays roi to poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2.

Step D

  1. After finishing Step B and Step C, poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2 is triggered.
  2. The output of poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2 triggers poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchMuxCalculator.
  3. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchMuxCalculator relays its input to filtered_landmarks (== pose_landmarks).

The point is that Step B and Step C is concurrent. As long as filtered_visibility is calculated and relayed to LandmarksSmoothingCalculator before roi is calculated (or vice versa), this issue does not occur. However, if filtered_visibility is calculated and roi is calculated before filtered_visibility is relayed to LandmarksSmoothingCalculator, the following occurs.

Step B and C

  1. poselandmarkgpu__poselandmarkfiltering__switchcontainer_1__VisibilitySmoothingCalculator_2 adds its output to filtered_visibility (Step B # 3).
  2. InputStreamHandler pops the new filtered_visibility packet from the queue and next_timestamp_bound_ for filtered_visibility is updated.
  3. The new filtered_visibility packet is sent to the corresponding input streams.
  4. poselandmarkgpu__poselandmarkfiltering__AlignmentPointsRectsCalculator adds its output to roi (Step C # 4).
  5. InputStreamHandler pops the new roi packet from the queue and next_timestamp_bound_ for roi is updated.
  6. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator is triggered by the new filtered_visibility packet.
  7. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchDemuxCalculator relays each input packet to poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2.
  8. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2 is triggered while the roi is empty.
  9. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2 does not output, but updates its output timestamp bound.
  10. poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__SwitchMuxCalculator updates the timestamp bound of filtered_landmarks.
  11. If a client is polling the output (using OutputStreamPoller#Next), this change of the timestamp bound is detected and an empty packet is returned to the client.

Note that if the 5th step occurs after the 7th step finishes, the timestamp bound of roi won't be updated in the 7th step and poselandmarkgpu__poselandmarkfiltering__switchcontainer_2__LandmarksSmoothingCalculator_2 won't be triggered in the 8th step.

This issue does not occur for world_landmarks (filtered_world_landmarks) because the corresponding SwitchContainer has only one input stream and its container node will be triggered at most once.

Workaround

After all, this issue occurs because SwitchDemuxCalculator triggers LandmarksSmoothingCalculator before all the dependent inputs are given. I think we can avoid it by using DefaultInputStreamHandler for the SwitchDemuxCalculator instead of ImmediateInputStreamHandler.

iff --git a/WORKSPACE b/WORKSPACE
index 28f4a6b..c568c90 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -63,6 +63,7 @@ http_archive(
         "@//third_party:mediapipe_visibility.diff",
         "@//third_party:mediapipe_model_path.diff",
         "@//third_party:mediapipe_extension.diff",
+        "@//third_party:mediapipe_debug.diff",
         # "@//third_party:mediapipe_emscripten_patch.diff",
     ],
     sha256 = "6b43a4304ca4aa3a698906e4b4ff696d698d0b788baffd8284c03632712b1020",
diff --git a/third_party/mediapipe_debug.diff b/third_party/mediapipe_debug.diff
new file mode 100644
index 0000000..5b1074b
--- /dev/null
+++ b/third_party/mediapipe_debug.diff
@@ -0,0 +1,50 @@
+diff --git a/mediapipe/framework/tool/switch_container.proto b/mediapipe/framework/tool/switch_container.proto
+index a9c2d90..f38de33 100644
+--- a/mediapipe/framework/tool/switch_container.proto
++++ b/mediapipe/framework/tool/switch_container.proto
+@@ -27,4 +27,7 @@ message SwitchContainerOptions {
+ 
+   // Use DefaultInputStreamHandler for muxing & demuxing.
+   optional bool synchronize_io = 5;
++
++  optional bool synchronize_demux_io = 6;
++  optional bool synchronize_mux_io = 7;
+ }
+diff --git a/mediapipe/framework/tool/switch_demux_calculator.cc b/mediapipe/framework/tool/switch_demux_calculator.cc
+index 1c1d59d..7ea68d7 100644
+--- a/mediapipe/framework/tool/switch_demux_calculator.cc
++++ b/mediapipe/framework/tool/switch_demux_calculator.cc
+@@ -115,7 +115,7 @@ absl::Status SwitchDemuxCalculator::GetContract(CalculatorContract* cc) {
+     }
+   }
+   auto& options = cc->Options<mediapipe::SwitchContainerOptions>();
+-  if (!options.synchronize_io()) {
++  if (!options.synchronize_io() && !options.synchronize_demux_io()) {
+     cc->SetInputStreamHandler("ImmediateInputStreamHandler");
+   }
+   cc->SetProcessTimestampBounds(true);
+diff --git a/mediapipe/framework/tool/switch_mux_calculator.cc b/mediapipe/framework/tool/switch_mux_calculator.cc
+index 9982ae4..a44bc00 100644
+--- a/mediapipe/framework/tool/switch_mux_calculator.cc
++++ b/mediapipe/framework/tool/switch_mux_calculator.cc
+@@ -154,7 +154,7 @@ absl::Status SwitchMuxCalculator::Process(CalculatorContext* cc) {
+   // Update the input channel index if specified.
+   channel_index_ = tool::GetChannelIndex(*cc, channel_index_);
+ 
+-  if (options_.synchronize_io()) {
++  if (options_.synchronize_io() || options_.synchronize_mux_io()) {
+     // Start with adding input signals into channel_history_ and packet_history_
+     if (cc->Inputs().HasTag("ENABLE") &&
+         !cc->Inputs().Tag("ENABLE").IsEmpty()) {
+diff --git a/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt b/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt
+index bb3665f..27d141c 100644
+--- a/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt
++++ b/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt
+@@ -100,6 +100,7 @@ node {
+   options: {
+     [mediapipe.SwitchContainerOptions.ext] {
+       enable: true
++      synchronize_demux_io: true
+       contained_node: {
+         calculator: "LandmarksSmoothingCalculator"
+         options: {
homuler commented 1 year ago

@Kasra-G Will you test if the above patch resolves your issue?

dayowoo commented 1 year ago

Workaround

After all, this issue occurs because SwitchDemuxCalculator triggers LandmarksSmoothingCalculator before all the dependent inputs are given. I think we can avoid it by using DefaultInputStreamHandler for the SwitchDemuxCalculator instead of ImmediateInputStreamHandler.

iff --git a/WORKSPACE b/WORKSPACE
index 28f4a6b..c568c90 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -63,6 +63,7 @@ http_archive(
         "@//third_party:mediapipe_visibility.diff",
         "@//third_party:mediapipe_model_path.diff",
         "@//third_party:mediapipe_extension.diff",
+        "@//third_party:mediapipe_debug.diff",
         # "@//third_party:mediapipe_emscripten_patch.diff",
     ],
     sha256 = "6b43a4304ca4aa3a698906e4b4ff696d698d0b788baffd8284c03632712b1020",
diff --git a/third_party/mediapipe_debug.diff b/third_party/mediapipe_debug.diff
new file mode 100644
index 0000000..5b1074b
--- /dev/null
+++ b/third_party/mediapipe_debug.diff
@@ -0,0 +1,50 @@
+diff --git a/mediapipe/framework/tool/switch_container.proto b/mediapipe/framework/tool/switch_container.proto
+index a9c2d90..f38de33 100644
+--- a/mediapipe/framework/tool/switch_container.proto
++++ b/mediapipe/framework/tool/switch_container.proto
+@@ -27,4 +27,7 @@ message SwitchContainerOptions {
+ 
+   // Use DefaultInputStreamHandler for muxing & demuxing.
+   optional bool synchronize_io = 5;
++
++  optional bool synchronize_demux_io = 6;
++  optional bool synchronize_mux_io = 7;
+ }
+diff --git a/mediapipe/framework/tool/switch_demux_calculator.cc b/mediapipe/framework/tool/switch_demux_calculator.cc
+index 1c1d59d..7ea68d7 100644
+--- a/mediapipe/framework/tool/switch_demux_calculator.cc
++++ b/mediapipe/framework/tool/switch_demux_calculator.cc
+@@ -115,7 +115,7 @@ absl::Status SwitchDemuxCalculator::GetContract(CalculatorContract* cc) {
+     }
+   }
+   auto& options = cc->Options<mediapipe::SwitchContainerOptions>();
+-  if (!options.synchronize_io()) {
++  if (!options.synchronize_io() && !options.synchronize_demux_io()) {
+     cc->SetInputStreamHandler("ImmediateInputStreamHandler");
+   }
+   cc->SetProcessTimestampBounds(true);
+diff --git a/mediapipe/framework/tool/switch_mux_calculator.cc b/mediapipe/framework/tool/switch_mux_calculator.cc
+index 9982ae4..a44bc00 100644
+--- a/mediapipe/framework/tool/switch_mux_calculator.cc
++++ b/mediapipe/framework/tool/switch_mux_calculator.cc
+@@ -154,7 +154,7 @@ absl::Status SwitchMuxCalculator::Process(CalculatorContext* cc) {
+   // Update the input channel index if specified.
+   channel_index_ = tool::GetChannelIndex(*cc, channel_index_);
+ 
+-  if (options_.synchronize_io()) {
++  if (options_.synchronize_io() || options_.synchronize_mux_io()) {
+     // Start with adding input signals into channel_history_ and packet_history_
+     if (cc->Inputs().HasTag("ENABLE") &&
+         !cc->Inputs().Tag("ENABLE").IsEmpty()) {
+diff --git a/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt b/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt
+index bb3665f..27d141c 100644
+--- a/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt
++++ b/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt
+@@ -100,6 +100,7 @@ node {
+   options: {
+     [mediapipe.SwitchContainerOptions.ext] {
+       enable: true
++      synchronize_demux_io: true
+       contained_node: {
+         calculator: "LandmarksSmoothingCalculator"
+         options: {

Where can I check that code? @homuler @Kasra-G Thank you..

homuler commented 1 year ago

@dayowoo The code no longer works because MediaPipe is updated. Can you still reproduce this issue in the latest version?

homuler commented 1 day ago

close it because the solution is replaced with the Task API.