ericdrosas87 commented 1 year ago

Analysis and Design

Changelog

N/A

What this enhancement will do:

Expose the logic for creating the manifest CSV file in code, and also separate out the steps for saving the manifest CSV file to the RSP Notebook Aspect's filesystem so that it may be edited before being sent to Zooniverse. This will also allow for the PI to use their own previously created manifest CSV file for sending data.

Why this enhancement is needed:

Currently, a functional version of the notebook cell that creates the manifest file exists in the Citizen_Science_Testing notebook, but it is rigidly aligned to a specific TAP query and Butler image data. It is has been requested that this functionality exist in the other non-SDK notebooks in such a way that the code is flexible enough to allow for any columns to be appended to the manifest file to enable a broad range of Zooniverse project functionality.

How this enhancement will work:

The following code snippets show an example of TAP query with Butler image data manifest CSV file creation.

Creating the manifest file in memory:

# In-memory manifest file as an array of dicts
manifest = []

# Specify the directory that the cutouts will be output to:
batch_dir = "./cutouts/"

# Create directory if it does not already exist
if os.path.isdir(batch_dir) == False:
    os.mkdir(batch_dir)

# Loop over results_table, or any other iterable provided by the PI:
for index, row in results_table.iterrows():
    # Use the Butler to get data based on the data within the iterable
    deepCoadd = butler.get('deepCoadd', dataId=row['dataId'])
    filename = "cutout"+str(row['objectId'])+".png"
    figout = utils.make_figure(deepCoadd, batch_dir + filename)

    # Create the CSV-file-row-as-dict 
    csv_row = {
        "filename": filename, # required column, do not change the column name
        "objectId": row.objectId, # required column, do not change the column name
        # Add your desired columns:
        "coord_ra": row.coord_ra,
        "coord_dec": row.coord_dec,
        "g_cModelFlux": row.g_cModelFlux,
        "r_cModelFlux": row.r_cModelFlux,
        "r_extendedness": row.r_extendedness,
        "r_inputCount": row.r_inputCount
    }
    manifest.append(csv_row)
    utils.remove_figure(figout)

Saving the manifest file to the filesystem:

manifest_path = write_metadata_file(manifest, batch_dir)

print("The manifest CSV file can be found at the following relative path:")
print(manifest_path)

Requirements

1. Have a cell in each non-SDK notebook that creates the manifest CSV file from an iterable object that the PI provides (TAP query results array, etc.)

User Story:

As a PI using the citizen science notebooks to send data to Zooniverse, I want the code that creates the manifest CSV file exposed in each of the non-SDK notebooks so that I may edit the columns and values of the manifest file programmatically.

Acceptance Criteria:

Given that a PI has an iterable object (such as a Panda dataframe, TAP query result array, etc.), when they modify the manifest CSV file creation cell to reference their iterable then a manifest file is created in memory.

2. The path to the manifest CSV file will be output from a cell so that the PI can manually edit it if need be, but manually editing the CSV file is not necessary

User Story:

As a PI using the citizen science notebooks to send data to Zooniverse, I want the ability to manually edit the CSV file manually so that I can make ad hoc adjustments when necessary.

Acceptance Criteria:

Given that the PI has created a manifest file in memory with a valid iterable object, when they need to make manual edits to the manifest file then they can do so by running a cell to find the path of the file before data is sent to Zooniverse.

3. The PI will have the option to skip the step if they have a pre-prepared manifest CSV that they would like to use

User Story:

As a PI using the citizen science notebooks to send data to Zooniverse, I want the ability to provide a manifest file that has previously been created so that I am not bound to using the notebooks' manifest CSV file creation logic.

Acceptance Criteria:

Given that the previously created manifest CSV file either contains the new project data or is placed in a directory alongside the new project data, when the path of the uploaded manifest CSV file is specified by the PI, then the notebook sends the data to Zooniverse succesfully.

4. The required fields for each notebook's use case ("filename" for astro cutouts, for example) will be indicated in the manifest CSV file creation cell

User Story:

As a PI using the citizen science notebooks to send data to Zooniverse, I want the notebook cells that pertain to the manifest file creation to be self documenting so that I can easily understand what, if any, required columns are expected.

Acceptance Criteria:

Given that the PI has read through the markdown cells, code comments, and code itself, when they need to edit the cells that pertain to the manifest CSV file creation then they understand how to do so.

5. The edc_ver_id will be appended behind the scenes so that the PI does not need to worry about it

User Story:

As a PI using the citizen science notebooks to send data to Zooniverse, I do not want to see required columns or values that should not be edited by me so that the notebooks are less error-prone.

Acceptance Criteria:

Given that the PI has read through the markdown cells, code comments, and code itself, when they attempt to send data to the Zooniverse then the required columns and values will automatically be appended to the manifest CSV file.

Notes

The onus of abiding by data rights is on the scientist, not the notebook(s).
The PI will need to provide a valid iterable, such as an array of query results, to the manifest CSV file creation cells in order for them to function properly.

Desired resolution/approval date

March 28th 2023

ericdrosas87 commented 1 year ago

This will replace the #30 A&D

beckynevin commented 1 year ago

@ericdrosas87 , what's the status on this issue? does the version of the notebook in main provide the full functionality requested here?

ericdrosas87 commented 1 year ago

@beckynevin Yep this can be closed out

lsst-epo / citizen-science-notebooks

Expose manifest file creation and manifest file to the PI before sending data to Zooniverse #34

Analysis and Design

Changelog

What this enhancement will do:

Why this enhancement is needed:

How this enhancement will work:

Requirements

1. Have a cell in each non-SDK notebook that creates the manifest CSV file from an iterable object that the PI provides (TAP query results array, etc.)

User Story:

Acceptance Criteria:

2. The path to the manifest CSV file will be output from a cell so that the PI can manually edit it if need be, but manually editing the CSV file is not necessary

User Story:

Acceptance Criteria:

3. The PI will have the option to skip the step if they have a pre-prepared manifest CSV that they would like to use

User Story:

Acceptance Criteria:

4. The required fields for each notebook's use case ("filename" for astro cutouts, for example) will be indicated in the manifest CSV file creation cell

User Story:

Acceptance Criteria:

5. The edc_ver_id will be appended behind the scenes so that the PI does not need to worry about it

User Story:

Acceptance Criteria:

Notes

Desired resolution/approval date