Inference on other datasets

tobyperrett commented 4 months ago

Hi. I’ve read the paper (really well written by the way!), and I’d like to try it in inference mode on a different dataset. It would be nice if I could start with an RGB image, and end up with the aligned object model to visualise. Currently, I’m not able to, as I can’t find the following:

Script which takes in an RGB image and extracts a hand mesh so it’s compatible with the provided inference script.
Script to query genie to obtain an object model.
Script to align the object model with the output of MCC-HO.

Have I missed something (very likely), or if not do you plan to release these, as I think they'd be really useful? Thanks for your help!

janehwu commented 4 months ago

Hi, thanks for your interest!

We use HaMeR to extract a hand mesh (compatible with the inference script): https://geopavlakos.github.io/hamer/.

Once you have the hand mesh, you also need to set the Pytorch3D camera intrinsics as in this file: https://github.com/janehwu/mcc-ho/blob/main/demo/camera_intrinsics_mow.json

If it's helpful, this is how I converted pyrender (used by HaMeR w/ a focal length of 1000 instead of the default 5000) to pytorch3d:

        pyrender_focal_length = 1000
        scale = image_height / 2.0
        # Get the PyTorch3D focal length and principal point.
        focal_pytorch3d = pyrender_focal_length / scale

        # Intrinsics
        focal_length = (focal_pytorch3d, focal_pytorch3d)
        principal_point = (0., 0.)

The Genie API isn't free, but you can access their Discord server to query the text-to-3D model: https://lumalabs.ai/genie?view=create
We used ICP to align the object with the output of MCC-HO. I used this implementation: https://github.com/ClayFlannigan/icp

Let me know if you have further questions.

tobyperrett commented 4 months ago

Thanks. I've managed to install everything, and can run the demo. But when I provide it with my hand .obj from hamer, the associated hand/object masks, rgb and intrinsics, it gives the following error: writing failed max(): Expected reduction dim 0 to have non-zero size. It looks as if it's not making any predictions at all.

I think the problem is coming from the hand mesh, as when I just use the demo one with the rest of my inputs, it at least provides an output (albeit not a very good one). I've noticed that the seen_xyz still has -inf values in it for my mesh, but not for the demo one.

Would you mind me sending you these files to have a quick look at, as we'd like to use this for an ongoing project if it works? Thanks again for your help.

janehwu commented 4 months ago

Sure! Feel free to send me the files at janehwu@berkeley.edu.

fujenchu commented 4 weeks ago

Hi Jane, thanks for the great work!

I am also trying to run MCC-HO on images from other dataset.

I noticed that hand meshes I got from the vanilla HaMeR model are approximately 0.5x smaller than yours (hand mesh in demo). Also, the coordinate system is different (with x, z to be -x, -z)

Would you show us that how we modified HaMeR so we can get the same mesh as you please? Thanks in advance!

janehwu / mcc-ho

Inference on other datasets #2