How do we run butler in parallel to get 10k sources?

lsst-epo / citizen-science-notebooks

A collection Jupyter notebooks that can be used to associate Rubin Science Platform data to a Zooniverse citizen science project.

3 stars 1 forks source link

How do we run butler in parallel to get 10k sources? #89

Open beckynevin opened 5 months ago

beckynevin commented 5 months ago

Not really a bug, more of a feature update.

We're currently running butler.get individually for each source in the 10k. We'd like to follow the example in DP02_04b:

datasetType = 'calexp'
dataId = {'visit': 192350}
datasetRefs = set(registry.queryDatasets(datasetType, dataId=dataId))

for i, ref in enumerate(datasetRefs):
    print(ref.dataId)
    if i > 2:
        print('...')
        break

print(f"Found {len(datasetRefs)} detectors")

beckynevin commented 5 months ago

@ericdrosas87 can you drop that awesome performance table for the 1k in here?

ericdrosas87 commented 5 months ago

ericdrosas87 commented 5 months ago

In the above screenshot, the container size is representative of the "Server size" in the notebook selection page:

The tests were performed in a simple way, marking butler.get() times in the make_manifest_with_images() function in utils.py:

ericdrosas87 commented 5 months ago

For reference