lsst-epo / citizen-science-notebooks

A collection Jupyter notebooks that can be used to associate Rubin Science Platform data to a Zooniverse citizen science project.
3 stars 1 forks source link

Variable stars #95

Closed beckynevin closed 2 months ago

beckynevin commented 5 months ago

Changes include:

@clareh I'll need some help with clarity and pedagogy for notebook 02.

beckynevin commented 4 months ago

@ericdrosas87 makes the very excellent point that there's a lot of public facing manifest file creating stuff shown in notebook 02 that we had previously obscured in notebook 01, specifically this cell:

`batch_dir = './variable_stars_output/' figout_data = {"diaobjectId": diaobjectid} cutouts = []

for i, idx in enumerate(idx_select): star_ra = sorted_sources['ra'][idx] star_dec = sorted_sources['decl'][idx] star_visitid = sorted_sources['visitId'][idx] star_detector = sorted_sources['detector'][idx] star_id = sorted_sources['diaObjectId'][idx] star_ccdid = sorted_sources['ccdVisitId'][idx] calexp_image = utils.cutout_calexp(butler, star_ra, star_dec, star_visitid, star_detector, 50) figout = utils.make_calexp_fig(calexp_image, batch_dir + str(starid) + "" + str(star_ccdid) + ".png") plt.show() del figout figoutdata['location:image'+str(i)] = str(starid) + \ "" + str(star_ccdid) + ".png" figoutdata['diaobjectId:image'+str(i)] = str(star_id) figout_data['filename'] = str(starid) + "" + str(star_ccdid) + ".png"

df_manifest = pd.DataFrame(data=figout_data, index=[0]) outfile = batch_dir + "manifest.csv" df_manifest.to_csv(outfile, index=False, sep=',') `

ericdrosas87 commented 4 months ago

The following code in cell 31 produces an error if there are any folders in the batch_dir/image_dir:

    for j, file in enumerate(os.listdir(image_dir)):
        if (str.split(file, '.')[1] == 'png' and
                str.split(file, '_')[0] == str(id_star)):
            img_id = int(str.split(str.split(file, '_')[1], '.')[0])
            star_name.append(str(id_star) + '_' +
                             str(img_id) + '.png')

The str.split() attempts to split on folders if any exist (as was the case in the folder structure that Sreeni was using, I was getting an error until I tried a brand new output folder). This is an easy thing to check though using either os.path. isfile(), os.path. isdir() before attempting to do the split. This also may be related to the issue that @clareh was seeing. Otherwise the notebook ran fine for me.

ericdrosas87 commented 4 months ago

Also, something else I just noticed: it looks like the value for objectId/diaObjectId is the same for all objectId:image_/diaObjectId:image_ columns. It doesn't break the pipeline, but it will mean that we're saving inaccurate data in the citSci database.

beckynevin commented 4 months ago

Also, something else I just noticed: it looks like the value for objectId/diaObjectId is the same for all objectId:image_/diaObjectId:image_ columns. It doesn't break the pipeline, but it will mean that we're saving inaccurate data in the citSci database.

That's because there is no objectId for these variable stars by definition. We could potentially do some sort of a nearest object match to the objectId catalog to pull an objectId but there's no guarantee it's the same source - essentially these variable stars are identified using their diaObjectId, which is from the difference images. Does that make sense? That's why I was asking if we could rename this first column to diaObjectId because it is incorrectly labelled.

The following code in cell 31 produces an error if there are any folders in the batch_dir/image_dir:

    for j, file in enumerate(os.listdir(image_dir)):
        if (str.split(file, '.')[1] == 'png' and
                str.split(file, '_')[0] == str(id_star)):
            img_id = int(str.split(str.split(file, '_')[1], '.')[0])
            star_name.append(str(id_star) + '_' +
                             str(img_id) + '.png')

The str.split() attempts to split on folders if any exist (as was the case in the folder structure that Sreeni was using, I was getting an error until I tried a brand new output folder). This is an easy thing to check though using either os.path. isfile(), os.path. isdir() before attempting to do the split. This also may be related to the issue that @clareh was seeing. Otherwise the notebook ran fine for me.

Awesome, I added an os.path.isfile() in here and tested with a fake folder to make sure it doesn't error out.

beckynevin commented 4 months ago

Also, something else I just noticed: it looks like the value for objectId/diaObjectId is the same for all objectId:image_/diaObjectId:image_ columns. It doesn't break the pipeline, but it will mean that we're saving inaccurate data in the citSci database.

Now, both are diaObjectId!

beckynevin commented 3 months ago

@ericdrosas87 makes the very excellent point that there's a lot of public facing manifest file creating stuff shown in notebook 02 that we had previously obscured in notebook 01, specifically this cell:

`batch_dir = './variable_stars_output/' figout_data = {"diaobjectId": diaobjectid} cutouts = []

for i, idx in enumerate(idx_select): star_ra = sorted_sources['ra'][idx] star_dec = sorted_sources['decl'][idx] star_visitid = sorted_sources['visitId'][idx] star_detector = sorted_sources['detector'][idx] star_id = sorted_sources['diaObjectId'][idx] star_ccdid = sorted_sources['ccdVisitId'][idx] calexp_image = utils.cutout_calexp(butler, star_ra, star_dec, star_visitid, star_detector, 50) figout = utils.make_calexp_fig(calexp_image, batch_dir + str(starid) + "" + str(star_ccdid) + ".png") plt.show() del figout figoutdata['location:image'+str(i)] = str(starid) + "" + str(star_ccdid) + ".png" figoutdata['diaobjectId:image'+str(i)] = str(star_id) figout_data['filename'] = str(starid) + "" + str(star_ccdid) + ".png"

df_manifest = pd.DataFrame(data=figout_data, index=[0]) outfile = batch_dir + "manifest.csv" df_manifest.to_csv(outfile, index=False, sep=',') `

Fix made, now obscured as in notebook 01 with citsci_pipeline. utils updated, and names of make_manifest_with_deepcoadd_images and make_manifest_with_calexp_images

beckynevin commented 2 months ago
  • Section 2 - add a link to the tutorial notebook about the bulter? Added a reference to the butler tutorial in the introduction
  • 2.3 fsodo sources - can we combine these two functions and just jump to showing the unique ones? Deleted the line where I print out the results, but kept this in two separate cells since it's two separate ideas.
  • general comment: can we make it more clear the parts that are "generating test data" and the parts that are "how to make a flip book"? I think its not as clear in this notebook which is the part that you might want to use if you are making a flip book for a different science case. I know this is a vague comment - we can talk abut it in the tech meeting if it helps! Added some section headings to clarify the difference between the first part of the notebook for generating data and the second half which deals with creating the manifest and sending the data to Zooniverse.
  • does section 2.6 need to show the images? as section 2.7 does...

I addressed all of these comments in the latest push for notebook 02.

Can you have another look @clareh and see if there's anything else to fix?