johndpope / MIMO-hack

using sonnet / chatgpt o1-preview to recreate MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling (is this going to work?? - idk 🤷)
https://arxiv.org/pdf/2409.16160
MIT License
8 stars 1 forks source link

MIMO-hack

https://github.com/Uminosachi/inpaint-anything

test_motion.py

https://huggingface.co/lilpotat/pytorch3d/tree/main

Dataset

We create a human video dataset called HUD-7K to train our model. This dataset consists of 5K real character videos and 2K synthetic character animations. The former does not require any annotations and can be automatically decomposed to various spatial attributes via our scheme. To enlarge the range of the real dataset, we also synthesize 2K videos by rendering character animations in complex motions under multiple camera views, utilizing En3D [21]. These synthetic videos are equipped with accurate annotations due to completely controlled production.

https://github.com/menyifang/En3D

Synthetic Training data

https://openxlab.org.cn/datasets/OpenXDLab/SynBody

pip install openxlab #Install

pip install -U openxlab #Upgrade

openxlab login # Log in and enter the corresponding AK/SK. Please view AK/SK at usercenter

openxlab dataset info --dataset-repo OpenXDLab/SynBody # Dataset information viewing and View Dataset File List

openxlab dataset get --dataset-repo OpenXDLab/SynBody #Dataset download

openxlab dataset download --dataset-repo OpenXDLab/SynBody --source-path /README.md --target-path /path/to/local/folder #Dataset file download

Julia Models - neutral

download this https://smpl-x.is.tue.mpg.de/download.php

Sapiens - get image - produce depths / normals / pose

https://github.com/facebookresearch/sapiens

python pose_vis.py '/home/oem/Desktop/image_1.png'  test.png output.json
python normal_vis.py '/home/oem/Desktop/image_1.png'  test.png 
python depth_estimation.py input_image.png output_depth_image.png output_depth_map.npy --depth_model 1b --seg_model fg-bg-1b

LAMA - SOTA inpainting

https://github.com/advimman/lama

Todo Components

1. Setup and Dependencies

2. Define Model Components

2.1 Temporal Attention Layer

2.2 Differentiable Rasterizer

2.3 Structured Motion Encoder

2.4 Canonical Identity Encoder

2.5 Scene and Occlusion Encoder

2.6 Diffusion Decoder

2.7 MIMO Model

3. Dataset Handling

3.1 Dataset Class

3.2 Data Preprocessing

3.3 Mask Computation

4. Training Procedure

4.1 Forward Diffusion Sampling

4.2 Training Loop

5. Inference Pipeline

6. Hyperparameters and Configuration

7. Additional Components

8. Saving and Loading the Model