lsst-epo / citizen-science-notebooks

A collection Jupyter notebooks that can be used to associate Rubin Science Platform data to a Zooniverse citizen science project.
3 stars 1 forks source link

Test out sending 10k random images through the citSci pipeline to ensure that the the post-`butler` pipeline is processing the data in a performant way #90

Closed ericdrosas87 closed 5 months ago

ericdrosas87 commented 5 months ago

User story

As the developer of the citSci pipeline, in parallel to the butler bulk query mechanism being enabled by the internal scientists I need to test out sending 10k random images to the Zooniverse platform via the citSci pipeline to ensure that the post-butler functionality is performant.

Definition of done

I have programmatically copied the same image 10k times and transferred it to the Zooniverse and documented the performance.

ericdrosas87 commented 5 months ago

The transfer time for 10k images currently takes approximately 4 minutes 30 seconds. Approximately 80 seconds of that is related to downloading the zip file from GCS, and then 80 seconds to upload the images to GCS. I can get the 80 seconds for downloading down using chunk concurrent downloading and likely get the uploading down as well. Also separating out the upload from the main send_image_data() will also shave off 30 to 45 seconds of the processing time.