lsst-epo / citizen-science-notebooks

A collection Jupyter notebooks that can be used to associate Rubin Science Platform data to a Zooniverse citizen science project.
3 stars 1 forks source link

Converting Citizen_Science_SDK.ipynb to a script #47

Closed beckynevin closed 1 year ago

beckynevin commented 1 year ago

@ericdrosas87 will convert the SDK notebook into a python script. It will need to be called as part of Citizen_Science_Testing.ipynb because it needs to run in order to establish a connection with Zooniverse. We'll also need to check if we're properly ingesting arguments and/or if we need the option to pass it arguments.

Also, let's rename it something like 'citizen_science_pipeline' or 'zooniverse_connect' that discourages users from editing it. <-- another option is to add instructions at the top of the script that discourage people from editing. Or is there a way to lock it from editing?

@bnord @clareh @jsv1206 @ericdrosas87 feel free to comment if you have other thoughts about converting SDK into a script and/or naming ideas.

ericdrosas87 commented 1 year ago

I've started working on this and I'm hoping to have something ready for testing before the end of the week. I think both renaming the file and adding comments to discourage editing is a good call. We may be able to edit user permissions on the file itself such that it's read-only, but that would be a DM SQuaRE question.

ericdrosas87 commented 1 year ago

Okay, I have a branch created (https://github.com/lsst-epo/citizen-science-notebooks/tree/EPO-8274) which has the following changes:

There is an issue that needs to be addressed regarding the installation of the Zooniverse Panoptes client and the Google Cloud Storage client. I think we can ask DM to provision these packages by default in the notebook environment so they do not need to be installed manually, or the packages can be added as dependencies to the PyPI package once it's published and will be installed automatically with a pip install - both should accomplish the same results.

These install commands cannot be run from the .py script I created in the same way they can be run from notebooks (.ipynb), hence the complication. For now I have the Install notebook there to manage this in the short-term.

You'll notice that the script contains a class that needs to be instantiated and then referenced with each method call:

# import CitSciPipeline class from script
from rubin_citsci_core_pipeline import CitSciPipeline

# Create a new instance of the class
cit_sci_pipeline = CitSciPipeline()

# Call the "login_to_zooniverse()" method on the class instance
cit_sci_pipeline.login_to_zooniverse(slug_name, email)

I took the liberty of modifying the Citizen_Science_Testing notebook to make use of the new script/class workflow.

@clareh @bnord @jsv1206 and @beckynevin this branch is ready for testing.

And of course, feel free to ask any questions!

bnord commented 1 year ago

Could the pip install be done via subprocess within the .py file?

ericdrosas87 commented 1 year ago

Could the pip install be done via subprocess within the .py file?

It possibly could be, though there would likely be unintended side-effects and generally this is not advised from a best practices/stability standpoint. By the time we package up the code the dependencies issue will be addressed by necessity and likely take the form of just bundling a requirements.txt in the package that will install these dependencies. My point of calling this out is just to rationalize why there is still a "backend notebook" floating around in the branch, albeit with a different name (Install) for the time being.

bnord commented 1 year ago

gotcha. Thanks for clarifying.

beckynevin commented 1 year ago

I'm going to close this issue since it's been merged with main :)