johndpope / MIMO-hack

using sonnet / chatgpt o1-preview to recreate MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling (is this going to work?? - idk 🤷)
https://arxiv.org/pdf/2409.16160
MIT License
8 stars 1 forks source link

detectron2 vs sapiens for MIMO #1

Closed johndpope closed 1 month ago

johndpope commented 1 month ago

Sapiens: Specialized in human-centric tasks: Sapiens excels at human pose estimation, body-part segmentation, and depth/normal estimation, making it ideal for complex human motion and interaction tasks in 3D-aware video synthesis, which is the core of MIMO. High-fidelity human representation: Sapiens is built to handle high-resolution, detailed human imagery and motion, which aligns well with MIMO’s need for accurate and lifelike character videos. Scalability: With models scaling up to 2 billion parameters, Sapiens provides the precision and generalization required for complex scenes and articulated human motions. Detectron2: General-purpose object detection: While powerful for object detection, segmentation, and keypoint detection, Detectron2 is not as specialized for tasks involving high-fidelity human understanding or 3D motion, which are critical for MIMO. Broader applications but less specific: Detectron2 is more versatile for general object detection but does not offer the same human-specific capabilities and high-resolution support as Sapiens.

johndpope commented 1 month ago

i add both https://github.com/johndpope/MIMO-hack/blob/main/utils.py