Open beckynevin opened 1 year ago
@beckynevin are you referring to section 7.5? If so, it's unclear to me as to if this is referring to "in-memory" objects or filesystem objects (files). It's unclear to me because of the phrase "memory usage" and the reference to the del
keyword in Python which is used to remove keys from dicts.
If RTN-045 is referring to files in the filesystem then I am curious if there is a distinction between the RSP Notebook user's home directory's space usage and a shared directory's space usage.
I am also talking about section 3.4 and having looked through some of the other tutorial notebooks they use del
like so:
fig = plt.figure(figsize=(6, 6))
xvals = [calexp_corners_ra[0], calexp_corners_ra[1], calexp_corners_ra[2], \
calexp_corners_ra[3], calexp_corners_ra[0]]
yvals = [calexp_corners_dec[0], calexp_corners_dec[1], calexp_corners_dec[2], \
calexp_corners_dec[3], calexp_corners_dec[0]]
plt.plot(xvals, yvals, ls='solid', color='grey', label='visit detector')
del xvals, yvals
for r, ref in enumerate(set(registry.queryDatasets("deepCoadd", dataId=dataId))):
deepCoadd_dataId = ref.dataId
str_tract_patch = '(' + str(ref.dataId['tract']) + ', ' + str(ref.dataId['patch'])+')'
deepCoadd_wcs = butler.get('deepCoadd.wcs', dataId=deepCoadd_dataId)
deepCoadd_bbox = butler.get('deepCoadd.bbox', dataId=deepCoadd_dataId)
deepCoadd_corners_ra, deepCoadd_corners_dec = get_corners_radec(deepCoadd_wcs, deepCoadd_bbox)
xvals = [deepCoadd_corners_ra[0], deepCoadd_corners_ra[1], deepCoadd_corners_ra[2], \
deepCoadd_corners_ra[3], deepCoadd_corners_ra[0]]
yvals = [deepCoadd_corners_dec[0], deepCoadd_corners_dec[1], deepCoadd_corners_dec[2], \
deepCoadd_corners_dec[3], deepCoadd_corners_dec[0]]
plt.plot(xvals, yvals, ls='solid', lw=1, label=str_tract_patch)
del xvals, yvals
del deepCoadd_dataId, deepCoadd_wcs, deepCoadd_bbox
del deepCoadd_corners_ra, deepCoadd_corners_dec
plt.xlabel('RA')
plt.ylabel('Dec')
plt.legend(loc='upper left', ncol=3)
plt.show()
I guess what I'm proposing might be a separate thing entirely because I'm proposing something within the notebook that will delete the .zip files in the main directory.
I guess what I'm proposing might be a separate thing entirely because I'm proposing something within the notebook that will delete the .zip files in the main directory.
We should probably be mindful of both memory/space taken up by files created by the citSci notebooks and memory usage within the notebook itself.
Notebook memory - because the project is charged on a CPU-usage-basis and so we should be mindful of the costs associated with errant memory usage.
Filesystem space usage - because Data Management has a strong preference for Notebook users not using their home directory as long-term storage.
However, that latter point is in contention with the idea that's been discussed of curating a large amount of data once and sending multiple batches from it to Zooniverse over time. I think DM would be amenable to an exception for citSci users storing data in their home directory for long periods of time - should we decide to pursue that strategy.
Okay so to your last point, the DM team might be okay with making an exception for citsci users - This makes sense to me for the cutout/ folder, which will have a bunch of cutouts, but what about creating an utility that deletes all extra nonsense .zip files after the data has been sent? Is this standard operation for Rubin or do we just rely upon users to delete all of the random zip files themselves? Here's a screenshot of what I'm talking about -
but what about creating an utility that deletes all extra nonsense .zip files after the data has been sent? Is this standard operation for Rubin or do we just rely upon users to delete all of the random zip files themselves?
It's certainly possible to have a utility function look for .zip
files in the user's home directory and delete them. Fairly small level-of-effort I would say, but as to if it's standard operation to programmatically do so - I actually don't know.
One of the tenets of the rubin notebooks - https://rtn-045.lsst.io/ - is to delete files after you create them. I wonder if we need to incorporate a cell into this notebook that finds and deletes these zip files, @jsv1206 and I recommend that we have a 'cleanup' cell in the notebook that probably calls an external function to find and clean up these .zip files after we send the data (right before we have the retrieve cell).
I've noticed after many runs of this notebook, I have quite a few .zip files hanging out that are cluttering things up. I'd imagine that if a bunch of users are all running this notebook multiple times it could unnecessarily take space on the cloud.
Let us know if you have strong preferences about how to deal with this @bnord @ericdrosas87 @clareh