using sonnet / chatgpt o1-preview to recreate MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling (is this going to work?? - idk 🤷)
Sapiens:
Specialized in human-centric tasks: Sapiens excels at human pose estimation, body-part segmentation, and depth/normal estimation, making it ideal for complex human motion and interaction tasks in 3D-aware video synthesis, which is the core of MIMO.
High-fidelity human representation: Sapiens is built to handle high-resolution, detailed human imagery and motion, which aligns well with MIMO’s need for accurate and lifelike character videos.
Scalability: With models scaling up to 2 billion parameters, Sapiens provides the precision and generalization required for complex scenes and articulated human motions.
Detectron2:
General-purpose object detection: While powerful for object detection, segmentation, and keypoint detection, Detectron2 is not as specialized for tasks involving high-fidelity human understanding or 3D motion, which are critical for MIMO.
Broader applications but less specific: Detectron2 is more versatile for general object detection but does not offer the same human-specific capabilities and high-resolution support as Sapiens.
Sapiens: Specialized in human-centric tasks: Sapiens excels at human pose estimation, body-part segmentation, and depth/normal estimation, making it ideal for complex human motion and interaction tasks in 3D-aware video synthesis, which is the core of MIMO. High-fidelity human representation: Sapiens is built to handle high-resolution, detailed human imagery and motion, which aligns well with MIMO’s need for accurate and lifelike character videos. Scalability: With models scaling up to 2 billion parameters, Sapiens provides the precision and generalization required for complex scenes and articulated human motions. Detectron2: General-purpose object detection: While powerful for object detection, segmentation, and keypoint detection, Detectron2 is not as specialized for tasks involving high-fidelity human understanding or 3D motion, which are critical for MIMO. Broader applications but less specific: Detectron2 is more versatile for general object detection but does not offer the same human-specific capabilities and high-resolution support as Sapiens.