pablovela5620 / mini-dust3r

Other
164 stars 10 forks source link

Mini-Dust3r

A miniature version of dust3r only for performing inference. This makes it much easier to use without needing the training/data/eval code. Tested on Linux, Apple Silicon Macs, and Windows (Thanks @Vincentqyw)

example output

Installation

Easily installable via pip

pip install mini-dust3r

Demo

A hosted demo can be found on huggingface here

or from source using Pixi

git clone https://github.com/pablovela5620/mini-dust3r.git
pixi run app

You can also just use rerun demo directly with

pixi run rerun-demo

Minimal Example

Uses Rerun to visualize the outputs

import rerun as rr
from pathlib import Path
from argparse import ArgumentParser
import torch

from mini_dust3r.api import OptimizedResult, inferece_dust3r, log_optimized_result
from mini_dust3r.model import AsymmetricCroCo3DStereo

def main(image_dir: Path):
    if torch.backends.mps.is_available():
        device = "mps"
    elif torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"

    model = AsymmetricCroCo3DStereo.from_pretrained(
        "nielsr/DUSt3R_ViTLarge_BaseDecoder_512_dpt"
    ).to(device)

    optimized_results: OptimizedResult = inferece_dust3r(
        image_dir_or_list=image_dir,
        model=model,
        device=device,
        batch_size=1,
    )
    log_optimized_result(optimized_results, Path("world"))

if __name__ == "__main__":
    parser = ArgumentParser("mini-dust3r rerun demo script")
    parser.add_argument(
        "--image-dir",
        type=Path,
        help="Directory containing images to process",
        required=True,
    )
    rr.script_add_args(parser)
    args = parser.parse_args()
    rr.script_setup(args, "mini-dust3r")
    main(args.image_dir)
    rr.script_teardown(args)

Inputs and Outputs

Inference Fuction

def inferece_dust3r(
    image_dir_or_list: Path | list[Path],
    model: AsymmetricCroCo3DStereo,
    device: Literal["cpu", "cuda", "mps"],
    batch_size: int = 1,
    image_size: Literal[224, 512] = 512,
    niter: int = 100,
    schedule: Literal["linear", "cosine"] = "linear",
    min_conf_thr: float = 10,
) -> OptimizedResult:

Consists of

Output from OptimizedResult

@dataclass
class OptimizedResult:
    K_b33: Float32[np.ndarray, "b 3 3"]
    world_T_cam_b44: Float32[np.ndarray, "b 4 4"]
    rgb_hw3_list: list[Float32[np.ndarray, "h w 3"]]
    depth_hw_list: list[Float32[np.ndarray, "h w"]]
    conf_hw_list: list[Float32[np.ndarray, "h w"]]
    masks_list: Bool[np.ndarray, "h w"]
    point_cloud: trimesh.PointCloud
    mesh: trimesh.Trimesh

Consists of

References

Full credit goes the Naver for their great work on