lsst-epo / citizen-science-notebooks

A collection Jupyter notebooks that can be used to associate Rubin Science Platform data to a Zooniverse citizen science project.
3 stars 1 forks source link

Modify notebook(s) and/or RSP Data Exporter service to allow for the sending of separate CSV files that represent Zooniverse "flipbook" GIFs #37

Closed ericdrosas87 closed 1 year ago

ericdrosas87 commented 1 year ago

User Stories:

As the integrator of the citizen science data pipeline, I need to research the requirements of the Zooniverse platform that pertain to a flipbook functionality so I can design/have a plan for coding for the flipbook functionality in the notebooks/RSP Data Exporter service.

Stretch goal:

As the integrator of the citizen science data pipeline, I need to modify the cSci notebook(s) and/or RSP Data Exporter service to allow for flipbook functionality data transfer to a project on the Zooniverse platform.


Acceptance Criteria:

Given a PI has curated their own image data in the RSP Notebook Aspect, when they attempt to use a citizen science notebook to send flipbook data as a CSV then the data is transferred and rendered as GIF/flipbook on the Zooniverse platform.

jsv1206 commented 1 year ago

The following format of metadata.csv (when uploaded manually) creates a flip book in zooniverse

flipbook

I tired to send the following format from RSP to zooniverse but this gives an error and doesn't send the data

metadata_test

ericdrosas87 commented 1 year ago

@jsv1206 can you copy/paste the output from the send_data() cell with the output from the failed data transfer so I can troubleshoot?

jsv1206 commented 1 year ago

`'1. Checking batch status' ' ** Warning! - The Zooniverse client is throwing an error about a missing subject set, this can likely safely be ignored.'

'2. Zipping up all the astro cutouts - this can take a few minutes with large data sets, but unlikely more than 10 minutes.'

'3. Uploading the citizen science data'

'4. Creating a new Zooniverse subject set'

'5. Notifying the Rubin EPO Data Center of the new data, which will finish processing of the data and notify Zooniverse'

'6. Cleaning up unused subject set on the Zooniverse platform, vendor_batch_id : 112798'

root ERROR: One or more errors occurred during the last step root ERROR: ['You currently have an active batch of data on the Zooniverse platform and cannot create a new batch until the current batch has been completed.'] root ERROR: Email address: jsv1206@gmail.com root ERROR: Timestamp: 2023-04-18 11:05:37.501600-07:00`

jsv1206 commented 1 year ago

I tested with the standard metadata.csv format and it sends images from RSP to Zooniverse metadata

ericdrosas87 commented 1 year ago

Okay, so it seems that as-is, the manifest file creation will not function with a flip book. Thank you for checking that Sree!

This brings to mind an interesting scenario I need clarification on:

I just tested uploading a new manifest file/images with two subjects:

  1. A "flipbook" of two images
  2. A static image

Zooniverse successfully created the subject set and handled the two correctly (one GIF/flipbook, and one static image).

I assume we would like the flipbook functionality to be as flexible as possible such that either static images or multi-image flipbook subjects can be uploaded via the same manifest file?

@clareh and co. what are your thoughts on this?

clareh commented 1 year ago

@ericdrosas87 A project would either be flip books or static images, not a mix of both. This is based on the fact that you'd have a different workflow for a flip book vs single images. As we restrict users to just one subject set, then you could just have one workflow, so then you'd be stuck with one type (flip book or images).

ericdrosas87 commented 1 year ago

Okay @clareh , new hurdle:

As far as I can tell in my testing, I cannot programmatically create a flipbook subject set via the Panoptes API.

For context: On the Zooniverse website, I can manually create a new subject set and select the image files and manifest CSV file from my local filesystem on my Macbook and that works fine. The images that are referenced as filenames in the manifest CSV file are uploaded directly to the Zooniverse platform at the same time and the flipbook subject set gets created just fine.

When I attempt to do this programmatically, the main difference is that the images are uploaded to the EDC object store and the flipbook manifest CSV references these images as URLs rather than just filenames. Referencing the multiple images as URLs rather than filenames is what the Zooniverse platform doesn't seem to like. I suspect that a code change on the Zooniverse platform is required to allow for URL references instead of uploading the images directly.

My manifest file created with the notebooks

How this renders on the Zooniverse platform when I try to view the subject:

Screen Shot 2023-04-19 at 3 42 36 PM

I know that the manifest file is valid though, because I can manually download the images from the EDC and take the above linked manifest file and upload them manually on the website and the flipbook is rendered fine.

Can you run this by Chris Lintott and see what he thinks? Mainly: Is it is a bug? Is it a use case they have never encountered and need to add this functionality to their API?

ericdrosas87 commented 1 year ago

@jsv1206 @beckynevin @bnord

Exciting update!

So Chris from Zooniverse has been out sick, but I was able to get a reply from a Cliff Johnson on how to create a flipbook programmatically via the Panoptes API. Luckily it was a small change that was needed, I just needed to make a small change to the RSP Data Exporter service that runs in the EDC and prefix each image column with the same location: prefix that is required when sending a single image:

Screen Shot 2023-05-31 at 3 59 47 PM

I haven't pushed out this change to the variable_stars branch just yet, nor have I deployed the updated RSP Data Exporter service. If you'd like to test you can:

  1. Make the same prefix change to the variable_stars notebook as you see in the screenshot above, and...
  2. Change the domain part of the URL in the alert_edc_of_new_citsci_data() of the SDK note from https://rsp-data-exporter-dot-skyviewer.uw.r.appspot.com to https://rsp-data-exporter-dev-dot-skyviewer.uw.r.appspot.com

Or just hang tight and I should get this change out by the end of the week or early next week at the latest!

jsv1206 commented 1 year ago

I made the changes but I get the following error. This occurs when the metadata.csv is not in the correct format, I think. I pushed the code with your changes to GitHub variable_stars branch.

flipbook_error
ericdrosas87 commented 1 year ago

Can you check to see if you do have an active subject set on the Zooniverse platform @jsv1206 ? If so, if possible please delete it and try again and post what it says in the notebook output.

ericdrosas87 commented 1 year ago

@jsv1206 Just confirmed that a fresh pull from the variable_stars repo with the flake8 magic commands commented out the flipbook gets created on the Zooniverse platform.

Also, apologies about missing your question previously: Are you asking if two subject sets can be created? Or if the data can be organized in such a way where the flipbook images are in one CSV and the rest of the metadata is in another CSV?

ericdrosas87 commented 1 year ago

I think we can close this out with the latest changes deployed to the RSP Data Exporter service and the variable_stars notebook.

https://github.com/lsst-epo/rsp-data-exporter/commit/58bf28e3517016e0b79ec9b4687a0d43150ce7f8