sensein / senselab

senselab is a Python package that simplifies building pipelines for biometric (e.g. speech, voice, video, etc) analysis.
http://sensein.group/senselab/
Apache License 2.0
10 stars 3 forks source link

Media Pipe Pose Estimation + Visualization #203

Open brukew opened 1 week ago

brukew commented 1 week ago

Description

Implemented Media Pipe pose estimation. Given an image path, will return a PoseSkeleton object with landmarks of each individual in the image.

Related Issue(s)

https://github.com/orgs/sensein/projects/45/views/3?pane=issue&itemId=82951656&issue=sensein%7Csenselab%7C173

Motivation and Context

This is the initial structure for pose estimation which is a valuable signal for behavior analysis. I will expand to more models and functionality, and with that, this will be more generalized.

How Has This Been Tested?

I tested with different kinds of images and attempts to access invalid properties of the PoseSkeleton object. Unit tests for the new functions + also manually tested for proper visualization.

Screenshots (if appropriate):

Types of changes

Created PoseSkeleton object that contains pose information for individuals in an image. Currently supports MediaPipe pose estimation + visualization functionality.

Checklist:

fabiocat93 commented 1 week ago

Thank you, @brukew, for the updates. I have a few suggestions and points to address based on our previous discussions:

  1. Reorganize Code Structure
    Please re-organize your code by separating data structures from functionalities (what we refer to as tasks in Senselab).

    • The skeleton should be treated as a data structure. It should define the skeleton and optionally include utilities for visualization. Alternatively, visualization can be handled as a standalone task.
    • Pose estimation should be a task (generating the skeleton as an output). All human pose estimation models (e.g., MediaPipe, AlphaPose) should conform to a consistent data structure.
    • To encourage generalizability, I suggest integrating a second pose estimation tool, such as YOLO due to its simplicity.
  2. Model Inclusion
    The current approach of including a model within the source code (e.g., src/senselab/video/tasks/pose_estimation/models/pose_landmarker.task) makes the package unnecessarily heavy. Instead, please ensure models are downloaded as needed. You can take inspiration from this example.

  3. Documentation
    Please add a dedicated documentation page:

    • Explain human pose estimation as a task, its purpose, and supported models.
    • For instance, you can reference this documentation for MediaPipe.
    • Feel free to draw inspiration from the existing audio task documentation (though some sections are incomplete).
  4. Tutorial
    Create a Jupyter Notebook tutorial to demonstrate:

    • How to use the interface.
    • What functionalities are available.
      Add this under a video folder in the tutorial/ directory.
  5. Failing Tests
    I noticed two tests are failing:

    • test_valid_image_single_person
      • AssertionError: "Input and output image shapes should match."
    • test_visualization_single_person
      • ValueError: "Input image must contain three-channel BGR data."
        Please double-check these tests to ensure they pass.
brukew commented 1 week ago

Nice, thank you for the feedback @fabiocat93. I will address your comments and ask questions as I go.