open-forest-observatory / geograypher

Multiview Semantic Reasoning with Geospatial Data
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

Make pix2face method explicit and add pyvista pix2face #115

Closed russelldj closed 1 month ago

russelldj commented 1 month ago

This is a major refactor. The main premise is that the pix2face computation operation should be made more explicit because it is used in multiple tasks. From there, we can implement multiple backends to actually compute these correspondences. One of these, which is new in this PR, is the PyVista backend using some of the recommendations from @Ruprabhu25 to render flat textures. This can be used to make pytorch3d and optional dependency since it is very challenging to install. Also @asidhu0 implemented caching the pix2face correspondences for unique camera-mesh pairs. This will allow dramatic runtime improvements on subsequent operations for a given scene.

Currently this is a work in progress for multiple reasons. The first is that the pytorch3d backend has been entirely removed and should be re-added as a derived class. The second is there is a memory leak causing OOM in the rendering and aggregation methods. This is likely a simple issue caused by improper use of generators.

russelldj commented 1 month ago

The OOM error was actually caused by opening too many pyvista plotters which did not properly clear their memory. This issues was addressed by making the plotter an attribute of the mesh object and reusing it.

russelldj commented 1 month ago

This PR should make sure to reintroduce the concept of chunking up a mesh and computing pix2face correspondences per-chunk, to avoid extra compute for faces outside the visible region.

russelldj commented 1 month ago

Currently, the cache takes 100s of GB of space which makes it impractical for many applications. It would be useful to decrease the size of this. One option would be checking that the datatype is only as precise as needed. Another option would be seeing if there's better compression approaches available. The caching library we use, ubelt, may provide some other options. Alternatively, we could try to use image-based compression, since there may be multiple neighboring pixels that have the same value. Given the values in these arrays can be high, there might not be image libraries that support it. Finally, we could try a more general approach like "run length encoding".

russelldj commented 1 month ago

This PR removed some under-developed multiview detection code. I'm not sure exactly where to put this, probably in a derived mesh. Or it could be ported to the camera class.