Closed falkben closed 1 year ago
Patch coverage: 100.00%
and no project coverage change.
Comparison is base (
fe56d85
) 94.45% compared to head (b265250
) 94.46%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Now that I have this PR sitting here, I'm actually leaning more towards doing the cache clearing inside tesscut and not in astrocut.
I worry about clearing a user's cache unnecessarily, if they were calling cube_cut()
directly. They would only experience an "issue" if they made repeated calls to cube_cut()
with the same cutout params, so it doesn't seem especially important.
Since tesscut knows the filesystem it's using (s3fs.core.S3FileSystem
) we would also avoid the issue of having to reach into "protected" member functions within astropy fits.
Closing in favor of just handling this in tesscut instead
The code:
Effectively is calling
s3fs.core.S3FileSystem.clear_instance_cache
. This clears the cache on thes3fs.core.S3FileSystem
class. If this is not done, the cache continues to grow, and repeated calls to cube_cut results in effectively a memory leak.This problem was made obvious with TESSCut, where the service performs cutouts over the lifetime of the server process. Even with cutouts of the exact same location, the cache continued to grow and would grow to multiple gigabytes.
We are not testing with other types of cloud storage, but this should work for any fsspec compatible file system, since they all inherit from the base class:
https://github.com/fsspec/filesystem_spec/blob/45a6aec7da1407243f9767c6ab0cff40efee72eb/fsspec/spec.py#L1412-L1425
When making repeated cutouts, to the same location, the cache is useful to have, at it seems to retrieve data from the cache first before making the external requests.
It's not clear to me what extent cutouts made by an individual user would benefit from this cache, though, since most cutouts are unique.
Within tesscut, with multiple users, it's possible that cutouts could be made to the same locations, but probably not very likely. Additionally, the cutouts would also have to be made inside the same process.
We could consider making the call to
s3fs.core.S3FileSystem.clear_instance_cache()
from within TESSCut itself instead of in thecube_cut()
function, if we didn't believe that it would useful for individual users to clear their cache. If we did this, then TESSCut could choose how often to clear it's own cache.