Multiple poses not outputting

marbel9 commented 6 months ago

I'm sure that this isn't a real issue but I can't figure out how multiple poses are meant to work. I have dialed up the pose count from 1 and while media pipe seems to track multiple poses just fine, the output only gives me coordinates for 1 pose at a time, usually flipping back and forth between my pose and the person's next to me.

How is this function meant to work?

Also love these toxs, thank you!

domisjustanumber commented 6 months ago

Hey @marbel9 I was just tinkering with the multi pose stuff this week, so good timing on your question!

The short version is multi-pose tracking in MediaPipe isn't very good right now, and I'd suggest you use something else to get good results. There is an older body tracking plugin Torin made that does an ok job of multi pose, or if you are running Windows and have an Nvidia RTX GPU the built in Body Track CHOP can do multi pose.

That being said, I have made some changes that dramatically smooth out the jittery pose tracking and add a second person to the pose_tracking output CHOP that will be in the next release, and hopefully Google come up with some better pose tracking models down the road that we can incorporate too.

I should note that even with 2 outputs in the yet-to-be-released version, it's still not a true multi person tracker. The true multi person trackers include additional data that tell you which person is in each pose object, which greatly helps with those scenarios you described where pose0 data jumps between different people from frame to frame.

For example if you have 2 people in the frame, a multi pose tracker will assign each person an ID (let's say person A and person B). In frame 1 person A is in pose object 0, and person B is in pose object 1.

In the next frame of video, for a whole bunch of reasons, person B may be the first one detected so their pose information goes into the pose 0 object, and person As data goes into the pose 1 object. If you are only looking at pose 0 in TD that will appear as if person A and B suddenly swapped places.

A multi pose tracker will include an additional ID tag to say "pose 0 object is person A in this frame" so you can track the same person frame-to-frame and not have them bounce around.

Anyhoo, yes the multi pose will be getting a little better in the future, but other options do a better job for now if you need multi pose tracking.

marbel9 commented 5 months ago

Hi Dom, thanks for the info! Also great to hear you are still working on this project :) Will the next release work on newer versions of TD as well? Thanks for all your hard work!

domisjustanumber commented 5 months ago

Yeah, we keep chipping away at improving things and adding new features as/when Google add them to MediaPipe!

And yup, we're making sure the plugin runs on 2022 and 2023 for now. At some point we might have to drop support for older versions but for now you're good!

domisjustanumber commented 4 months ago

FYI the latest release now exports multiple poses, but I don't think you want to use it. There's no overlap checking right now, so you could get 2 poses detected for the same person... which is less than ideal. I added in some smoothing so it's slightly less jittery, but it's not what I'd call... good.

Hopefully Google tidy it up in a future release of MediaPipe, but for now the capability to handle multiple poses is in our plugin and it's performance will improve as/when the underlying MediaPipe does :)

Silfron commented 4 months ago

I have the latest version of the release but cannot for the life of me figure out what I need to do to get the pose 2 data. I realize that it's going to be Not Great but I still would like to get it working. If there's documentation for this somewhere I haven't been able to find it. My problem is essentially the same as the original issue - I have poses dialed up to 2 and can see 2 poses in the pose_tracking but the output is only one set of coordinates.

domisjustanumber commented 4 months ago

Ok so 2 poses in the tracking is good progress! In order to make the multi-pose data backwards compatible, and to make it easier to match with unknown numbers of poses, I added each additional pose after the first one as another sample to the normalised_results CHOP.

That means if you only have 1 pose, you get a channel per keypoint with it's confidence value, same as before.

Any additional poses that are detected are added samples to the output CHOP. You can tell this is happening when the viewport switches from the grey single-channel view to the black and multi-coloured line view to indicate multi-sample data. You can also right-click and see the Info on the CHOP to see how many samples (i) it has.

If you want to have each pose as it's own channel, you can use a Shuffle CHOP to split every sample into a new channel (Split N Samples with a value of 1) and you'll now get an x0, y0 and x1 and y1 etc. for each pose. That might get what you want, but it gets complicated fast when you have more than 1 pose you need to track, as you have to build a new detection chain for each possible pose you might be tracking.

A more scalable way to handle things is to keep the pose data in the multi-sample domain. It has a bit of a learning curve to get your head around, but it works well. As an example, let's say you want to count "how many people have their hands above their heads"

You can use two Select CHOPs - one to extract the wrist y keypoint channel, and one for say the nose y keypoint channel.
Now pipe them into Math CHOP and subtract one from the other to get a distance
Now you could use a Logic CHOP to detect if the value goes within a set of bounds you define
The output will still be multi-sample for however many poses there are, and you'd want a counter of the totals. To get that,
Use a Shuffle CHOP to swap samples for channels.
Use a Math CHOP to add all the channels into a single channel - that's now your counter of matches.

If you find any CHOPs that are turning multi-sample data into single channel, make sure Time Slice turned Off under the Common tab.

I've attached a tox from another project I'm working on that I think will show you what I'm talking about. It's looking for how many people have:

Their elbows above their shoulders
Their wrists above their elbows
Their wrists under a certain distance apart

The tox is for TD 2023 and hopefully gives you a starting point to play with :) Clamp_detect.zip

Silfron commented 4 months ago

Wow thank you for the very thorough explanation!

torinmb / mediapipe-touchdesigner

Multiple poses not outputting #62