open-forest-observatory / geograypher

Multiview Semantic Reasoning with Geospatial Data
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

Allow Caching for pix2face #114

Closed asidhu0 closed 1 month ago

asidhu0 commented 1 month ago

This pull request introduces disk-based caching for the pix2face computation.

cacher = ub.Cacher('pix2face', depends=[mesh_hash, camera_hash, render_img_scale])

ub.Cacher provides on-disk caching so that the cache files persist across multiple Python instances.

The cache generates unique keys for each combination of the dependencies (mesh_hash, camera_hash, render_img_scale). For each unique key generated the cache will create a cache file on disk. So if we have multiple combinations of mesh_hash, camera_hash, render_img_scale, there will be a specific cache file on disk for each combination.

pix2face = cacher.tryload(on_error='clear')

tryload tries to load the cached data. If there’s an error loading, like data corruption, then the cache will clear its contents, signified by on_error='clear'.

To test if the cache files last between Python instances, I ran the concept_figures code and ran it again without restarting the kernel. The second time around there were no cache misses and all cache hits.

russelldj commented 1 month ago

One thing I didn't catch was you use ubelt but didn't poetry add ubelt to tell our dependency manager to track it. If you had done it, you should see the poetry.lock and pyproject.toml files get updated. You'd then commit both of those files and another person could run poetry install to get the updated set of dependencies.

russelldj commented 1 month ago

As best I can tell, caching only works within one run and is recomputed on the next. The cached data is being successfully written to /home/{HOME}/.cache/ubelt but there's more files than I have cameras. Perhaps there's floating point inaccuracy causing the hash to be different each time a new object is instantiated? Or do you have other ideas?

russelldj commented 1 month ago

Also, can we decrease the verbosity of the cacher? I don't personally like the cache miss/cache saving print outs.

russelldj commented 1 month ago

Because this caches to the home directory, and it's creating multiple copies of the cached data, it can run out of storage on the user partition pretty quickly. We should think of ways to address this.