janehwu / mcc-ho

MCC-HO
MIT License
21 stars 1 forks source link

Inference on other datasets #2

Open tobyperrett opened 4 months ago

tobyperrett commented 4 months ago

Hi. I’ve read the paper (really well written by the way!), and I’d like to try it in inference mode on a different dataset. It would be nice if I could start with an RGB image, and end up with the aligned object model to visualise. Currently, I’m not able to, as I can’t find the following:

  1. Script which takes in an RGB image and extracts a hand mesh so it’s compatible with the provided inference script.
  2. Script to query genie to obtain an object model.
  3. Script to align the object model with the output of MCC-HO.

Have I missed something (very likely), or if not do you plan to release these, as I think they'd be really useful? Thanks for your help!

janehwu commented 4 months ago

Hi, thanks for your interest!

  1. We use HaMeR to extract a hand mesh (compatible with the inference script): https://geopavlakos.github.io/hamer/.

Once you have the hand mesh, you also need to set the Pytorch3D camera intrinsics as in this file: https://github.com/janehwu/mcc-ho/blob/main/demo/camera_intrinsics_mow.json

If it's helpful, this is how I converted pyrender (used by HaMeR w/ a focal length of 1000 instead of the default 5000) to pytorch3d:

        pyrender_focal_length = 1000
        scale = image_height / 2.0
        # Get the PyTorch3D focal length and principal point.
        focal_pytorch3d = pyrender_focal_length / scale

        # Intrinsics
        focal_length = (focal_pytorch3d, focal_pytorch3d)
        principal_point = (0., 0.)
  1. The Genie API isn't free, but you can access their Discord server to query the text-to-3D model: https://lumalabs.ai/genie?view=create
  2. We used ICP to align the object with the output of MCC-HO. I used this implementation: https://github.com/ClayFlannigan/icp

Let me know if you have further questions.

tobyperrett commented 4 months ago

Thanks. I've managed to install everything, and can run the demo. But when I provide it with my hand .obj from hamer, the associated hand/object masks, rgb and intrinsics, it gives the following error: writing failed max(): Expected reduction dim 0 to have non-zero size. It looks as if it's not making any predictions at all.

I think the problem is coming from the hand mesh, as when I just use the demo one with the rest of my inputs, it at least provides an output (albeit not a very good one). I've noticed that the seen_xyz still has -inf values in it for my mesh, but not for the demo one.

Would you mind me sending you these files to have a quick look at, as we'd like to use this for an ongoing project if it works? Thanks again for your help.

janehwu commented 4 months ago

Sure! Feel free to send me the files at janehwu@berkeley.edu.

fujenchu commented 4 weeks ago

Hi Jane, thanks for the great work!

I am also trying to run MCC-HO on images from other dataset.

I noticed that hand meshes I got from the vanilla HaMeR model are approximately 0.5x smaller than yours (hand mesh in demo). Also, the coordinate system is different (with x, z to be -x, -z)

Would you show us that how we modified HaMeR so we can get the same mesh as you please? Thanks in advance!