sensein / senselab

senselab is a Python package that simplifies building pipelines for biometric (e.g. speech, voice, video, etc) analysis.
http://sensein.group/senselab/
Apache License 2.0
9 stars 3 forks source link

Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

Open brukew opened 6 days ago

brukew commented 6 days ago

Senselab Pose Estimation

Goal:
Integrate robust pose estimation workflows within Senselab

ViTPose performs best on infants - mediapipe and deeplabcut are low on accuracy


Support Plan


Workflow

  1. Upload media
  2. Receive custom output
  3. Perform further analysis using the output

Version Planning


Inputs/Outputs


At each step, proper documentation and tests are expected. Tutorials will also be implemented for each workflow (across models and modalities).

github-actions[bot] commented 6 days ago

👋 Welcome to Senselab!

Thank you for your interest and contribution. Senselab is a comprehensive Python package designed to process behavioral data, including voice and speech patterns, with a focus on reproducibility and robust methodologies. Your issue will be reviewed soon. Stay tuned!

fabiocat93 commented 6 days ago

hi @brukew , sounds nice. Do you mind sharing some more detailed insights on your plan?

For example,

I would recommend clarifying all these aspects before starting coding. Also, I do recommend thinking about some utility functions for plotting human pose (alone and overlapped to the original picture) and some utility data structure for the human pose so that you can save and process the results of the different models in the same way

brukew commented 5 days ago

thanks @fabiocat93, sounds good! Will update the issue as I make the plan.

fabiocat93 commented 12 hours ago

Senselab Pose Estimation

Goal: Integrate robust pose estimation workflows within Senselab

ViTPose performs best on infants - mediapipe and deeplabcut are low on accuracy

Support Plan

  • Start with MediaPipe

    • Easy to implement, lightweight (in terms of computation and speed)
  • Follow up with:

    • OpenPose: Best performance across humans and animals (depending on the specific model): Better performance than MediaPipe, but more computationally intensive
    • ViTPose: Best performance across humans and animals (depending on the specific model)
    • DeepLabCut: High performance on animal pose tracking

Workflow

  1. Upload media
  2. Receive custom output
  3. Perform further analysis using the output

Version Planning

  • V1:

    • Create pose estimations for images and videos

    • Custom datatypes for output

    • Support for MediaPipe

    • Visualization support

  • V2:

    • Expand support to OpenPose
  • V3:

    • Real-time pose detection across supported models (?)

Inputs/Outputs

  • Inputs:

  • Outputs:

    • PoseImage

    • Keypoints representing the pose skeleton

    • Visualization methods

    • PoseVideo

    • Keypoints representing the pose skeleton per video frame

    • Visualization methods

At each step, proper documentation and tests are expected. Tutorials will also be implemented for each workflow (across models and modalities).

Thank you @brukew . This is good. Here are a couple of comments:

As a minor note, feel free to reply in a thread instead of simply editing the original text of the issue. this way, we can keep track of all the reasoning process (this is mostly helpful to me, since my memory is not that great! thanks!)

fabiocat93 commented 11 hours ago

Also, here is a good document overviewing pose estimation as a task: https://medium.com/augmented-startups/top-9-pose-estimation-models-of-2022-70d00b11db43

Consider that you are not necessarily required to implement everything yourself. People have been working in the domain for a while and you can use their models and their utility functions. For instance, here is a related project you should look at:

Here are some more models I have tried recently and could be good to add at some point in the future:

satra commented 11 hours ago

there is also SLEAP and DANNCE and TULIP from here https://www.tdunnlab.org/

brukew commented 10 hours ago

@fabiocat93 Yes, both of your points make sense. The number of joints differs greatly between each model - I assume we want all information retained though so the pose skeleton object will just vary in size/content per model. I will look into how to ensure this is done best - maybe just hardcoding matches between the key points of each model and labeling it the same.

How should I approach using existing toolkits? If I just want a minor visualization utility function for example, would I need to add the whole toolkit as a requirement and import it? or would it be fine to just take the code from their github?

fabiocat93 commented 10 hours ago

How should I approach using existing toolkits? If I just want a minor visualization utility function for example, would I need to add the whole toolkit as a requirement and import it? or would it be fine to just take the code from their github?

Good question. If you can isolate a specific function that you need, you can simply copy-paste that and report the source+mentioning their LICENSE. Obv, this assumes that their license allows doing that