Media Pipe Pose Estimation + Visualization

brukew commented 1 week ago

Description

Implemented Media Pipe pose estimation. Given an image path, will return a PoseSkeleton object with landmarks of each individual in the image.

Related Issue(s)

https://github.com/orgs/sensein/projects/45/views/3?pane=issue&itemId=82951656&issue=sensein%7Csenselab%7C173

Motivation and Context

This is the initial structure for pose estimation which is a valuable signal for behavior analysis. I will expand to more models and functionality, and with that, this will be more generalized.

How Has This Been Tested?

I tested with different kinds of images and attempts to access invalid properties of the PoseSkeleton object. Unit tests for the new functions + also manually tested for proper visualization.

Screenshots (if appropriate):

Types of changes

Created PoseSkeleton object that contains pose information for individuals in an image. Currently supports MediaPipe pose estimation + visualization functionality.

Checklist:

[x] I have added tests to cover my changes.
[x] All new and existing tests passed.
[x] My code follows the code style of this project.

fabiocat93 commented 1 week ago

Thank you, @brukew, for the updates. I have a few suggestions and points to address based on our previous discussions:

Reorganize Code Structure
Please re-organize your code by separating data structures from functionalities (what we refer to as tasks in Senselab).
- The skeleton should be treated as a data structure. It should define the skeleton and optionally include utilities for visualization. Alternatively, visualization can be handled as a standalone task.
- Pose estimation should be a task (generating the skeleton as an output). All human pose estimation models (e.g., MediaPipe, AlphaPose) should conform to a consistent data structure.
- To encourage generalizability, I suggest integrating a second pose estimation tool, such as YOLO due to its simplicity.
Model Inclusion
The current approach of including a model within the source code (e.g., src/senselab/video/tasks/pose_estimation/models/pose_landmarker.task) makes the package unnecessarily heavy. Instead, please ensure models are downloaded as needed. You can take inspiration from this example.
Documentation
Please add a dedicated documentation page:
- Explain human pose estimation as a task, its purpose, and supported models.
- For instance, you can reference this documentation for MediaPipe.
- Feel free to draw inspiration from the existing audio task documentation (though some sections are incomplete).
Tutorial
Create a Jupyter Notebook tutorial to demonstrate:
- How to use the interface.
- What functionalities are available.
  Add this under a video folder in the tutorial/ directory.
Failing Tests
I noticed two tests are failing:
- test_valid_image_single_person
  - AssertionError: "Input and output image shapes should match."
- test_visualization_single_person
  - ValueError: "Input image must contain three-channel BGR data."
    Please double-check these tests to ensure they pass.

brukew commented 1 week ago

Nice, thank you for the feedback @fabiocat93. I will address your comments and ask questions as I go.

sensein / senselab