get dense corresponded / tracking coordination

petercmh01 commented 2 years ago

Hello: thanks for the awsome work and consistently helping with the issues!

I'm running visualization on a model that I trained on my own dataset. I was able to reproduce the dense corresponded mix reality on video and propagation to each individual frame images with a mask that I created my own.

I wonder is there any way that I can get the mask position / coordination in image from the model after I applied the dense tracking?

thanks in advance!

wpeebles commented 2 years ago

Hi @petercmh01. Here's an updated version of mixed_reality.py which supports this. You can call it with the same arguments as usual, but now you can also add --save_correspondences to your command to save dense_correspondences.pt in the output folder. You can open it with torch.load("dense_correspondences.pt"). The tensor has shape (num_frames, num_points, 2) with continuous (sub-pixel) values usually* in range [0, frame_resolution - 1] that represent the location of each point. The points are arranged in (x, y) format.

I haven't had time to test this updated script particularly thoroughly (but it at least runs :), so let me know if you encounter any issues. It should work on both single-GPU and distributed.

*Actually, the range of values is technically unbounded since some correspondences may lie beyond the boundary of the image. So, e.g., you may sometimes find negative numbers in that tensor.

UPDATE June 29, 2022: This feature has been merged into the main codebase, so I have removed the code snippet in this comment to make this thread more readable.

petercmh01 commented 2 years ago

Hello: @wpeebles , I tested the code and I was able to get some coordinates, but it seems like the coordinates for each frame is in a wrong order. I was able to get full dense tracking mask but it seems like the mask should be in other frames when I map it.

Can you check if there's a order issue when collect the propagated points?

By the way, the coordinates were all in floats, can I take its int or is there anything else (i.e. interpolation) that I need to aware of? can you roughly explain how the splat was done?

Thank you so much again for consistent help.

petercmh01 commented 2 years ago

Edit: sorry you're right I was iterating the x and y in a wrong way. The coordination was correct

wpeebles commented 2 years ago

Hi @petercmh01. Hmm, I'm pretty sure that the format should be in (x,y). Could it be that the points you're propagating are in (y,x) format, and so converting dense_correspondences.pt to (y,x) makes it consistent? Also, could you let me know if you ran the script with one versus multiple GPUs?

Regarding the splatting, it should usually be fine to round the predicted points to int format if you don't need sub-pixel accurate correspondences. But, our splatting implementation takes advantage of subpixel information so you'll get higher quality results in general if you use our splat_points function instead of quantizing to int. You can find documentation for splat_points in utils/vis_tools/helpers.py, but feel free to reach out if you have any trouble. Note that it only supports GPU currently (we didn't write a CPU implementation). The way it works under-the-hood is by placing a Gaussian at each (x,y) location, and so the final color assigned to a given quantized pixel is proportional to the distance to each Gaussian.

wpeebles commented 2 years ago

The save_correspondences feature for mixed_reality.py has now been merged into the main codebase. Thanks for testing it!

wpeebles / gangealing

get dense corresponded / tracking coordination #25