sensein / science4all

This repo will provide pointers to different questions/problems where we could use help.
Other
1 stars 1 forks source link

Project Proposal: Analyzing Children's Pose Estimation in Videos Using State-of-the-Art Tools #6

Open fabiocat93 opened 1 year ago

fabiocat93 commented 1 year ago

Background: Advancements in computer vision and deep learning have paved the way for automated pose estimation in videos, particularly for adults (here you can find a very recent review: https://www.iieta.org/journals/ts/paper/10.18280/ts.390111). However, a significant gap exists in our understanding of how well state-of-the-art pose estimation tools perform when analyzing the poses of children in videos. Children's movements are inherently different from those of adults, characterized by unique body proportions, postures, and dynamic behaviors. This project aims to bridge this research gap by investigating the accuracy, reliability, and applicability of various pose estimation tools in the context of analyzing children's poses over time.

Open Research Question: The central research question of this project is: "How do state-of-the-art pose estimation tools perform when analyzing the poses of children in videos, and what are the key factors influencing their accuracy and applicability?"

Proposed Methodological Steps:

  1. IRB Approval and Data Collection: Obtain Institutional Review Board (IRB) approval to ensure ethical considerations are met. Collect a diverse dataset of videos featuring children engaged in various activities. We suggest using the PInSoRo dataset, which includes 45+ hours of hand-coded recordings of social interactions between 45 child-child pairs and 30 child-robot pairs. In addition to annotations of social constructs, the dataset includes fully calibrated video recordings, 3D recordings of the faces, skeletal informations, full audio recordings, as well as game interactions.

  2. Tool Selection and Setup: Identify and select a set of state-of-the-art pose estimation tools specifically designed for human pose estimation. Utilize tools such as MotionBERT, OpenPose, MoveNet Lightning, MoveNet Thunder, and MediaPipe. Set up the necessary environments and configurations for each tool.

  3. Data Preprocessing: Prepare the collected video dataset by standardizing formats, resolutions, and annotations. Extract and annotate manual 3D pose data for benchmarking purposes.

  4. Automated Pose Estimation: Apply the selected pose estimation tools to the preprocessed video dataset to automatically extract 3D pose information for children over time.

  5. Benchmarking and Evaluation: Compare the automatically extracted 3D poses against the manually annotated poses. Calculate quantitative metrics such as accuracy, precision, recall, and F1-score to evaluate the performance of each tool.

  6. Agreement Rate Analysis: Compute the agreement rate among different pose estimation models by analyzing the consistency of their results. Identify instances of consensus and divergence in pose estimation outcomes.

  7. Latency Comparison: Measure and compare the latency of each pose estimation model in real-time scenarios. Analyze the impact of latency on the accuracy and reliability of pose estimation.

Conclusion: This project presents a valuable opportunity to contribute to the field of computer vision by investigating the effectiveness of state-of-the-art pose estimation tools when analyzing children's poses in videos. By addressing this research gap, we aim to enhance our understanding of the unique challenges posed by children's movements and provide insights into the suitability of existing tools for this specific application. Researchers interested in computer vision, child development, and pose estimation are encouraged to apply and contribute to this important endeavor.

Thank you for your interest and help! Please feel free to post any questions or comments on this issue, and I will be more than happy to assist you further.