lsst-epo / citizen-science-notebooks

A collection Jupyter notebooks that can be used to associate Rubin Science Platform data to a Zooniverse citizen science project.
3 stars 1 forks source link

Bring the example notebooks up to rtn-045 standards #64

Closed beckynevin closed 11 months ago

beckynevin commented 11 months ago

https://rtn-045.lsst.io/

beckynevin commented 11 months ago

@ericdrosas87 what do you think about this? - each Rubin notebook is required to have a header section that lists out packages we are teaching - the citsci_notebook_core_pipeline one is a little clunky - Packages: citsci_notebook_core_pipeline.rubin_citsci_core_pipeline.CitSciPipeline(), utils (citsci plotting and display utilities) - should I list out the whole thing or shorten? For context, other example notebooks put stuff like lsst.daf.butler here.

ericdrosas87 commented 11 months ago

@beckynevin I think there's room to play around with the import namespace used for the citSci backend package. It's probably worth first diving a little into the details here for context:

header information must include packages used

The term "packages" gets used a lot in Python development, or really just development in general, but note that "lsst.daf.butler" is not the name of the Butler PyPI package. The name of the package is actually lsst-daf-butler. To use your example of the citSci package:

citsci_notebook_core_pipeline.rubin_citsci_core_pipeline.CitSciPipeline()

Would be the Butler equivalent of saying that...:

lsst.daf.butler.Butler()

...is the name of the package, which is not accurate.

As you can see here on Github, the lsst.daf.butler is representative of the folder structure in the code repo itself.

Interestingly, I did have a chance to discuss this package with the DM SQRE team about a couple months back and I was asked if I would use the same namespace with the leading lsst as they were curious as to whether we would be using a rubin namespace, like rubin.epo.citsci. I'll follow-up up with them to see if they were asking for a specific reason (collaboration or Rubin "branding", for example).

ericdrosas87 commented 11 months ago

Also, regarding utils.py, that's not exactly a "package" either, more of a "helper script". But again, the term "package" is used quite loosely by various stakeholders within Rubin and otherwise, so I suggest checking with someone (maybe Melissa Graham?) on if the utils.py is considered a "package" in the context of RTN-045 standards, or if it can be ommitted.

ericdrosas87 commented 11 months ago

@beckynevin I think there's room to play around with the import namespace used for the citSci backend package.

But, tl;tr:

Yes the name of the import namespace can be changed so it's cleaner and I am open to suggestions.

beckynevin commented 11 months ago

@beckynevin I think there's room to play around with the import namespace used for the citSci backend package. It's probably worth first diving a little into the details here for context:

header information must include packages used

The term "packages" gets used a lot in Python development, or really just development in general, but note that "lsst.daf.butler" is not the name of the Butler PyPI package. The name of the package is actually lsst-daf-butler. To use your example of the citSci package:

citsci_notebook_core_pipeline.rubin_citsci_core_pipeline.CitSciPipeline()

Would be the Butler equivalent of saying that...:

lsst.daf.butler.Butler()

...is the name of the package, which is not accurate.

As you can see here on Github, the lsst.daf.butler is representative of the folder structure in the code repo itself.

Interestingly, I did have a chance to discuss this package with the DM SQRE team about a couple months back and I was asked if I would use the same namespace with the leading lsst as they were curious as to whether we would be using a rubin namespace, like rubin.epo.citsci. I'll follow-up up with them to see if they were asking for a specific reason (collaboration or Rubin "branding", for example).

Two thoughts - are you saying that they should be calling the package lsst-daf-butler in this header section instead of lsst.daf.butler? Also, I agree - it would be great to have either lsst or rubin as the leading part of our package name. Right now it seems somewhat disconnected because it doesn't have that. This is a subtle thing though, so maybe not immediately important?

beckynevin commented 11 months ago

Also, regarding utils.py, that's not exactly a "package" either, more of a "helper script". But again, the term "package" is used quite loosely by various stakeholders within Rubin and otherwise, so I suggest checking with someone (maybe Melissa Graham?) on if the utils.py is considered a "package" in the context of RTN-045 standards, or if it can be ommitted.

Yeah, I was also going to ask who I should reach out to if I have questions about the standards. Melissa is the go to on this?

beckynevin commented 11 months ago

citsci_notebook_core_pipeline.rubin_citsci_core_pipeline

I think we probably don't need 'core_pipeline' in the name? Also it might be confusing why citsci_notebook_core_pipeline is different than rubin_citsci_core_pipeline because they have such similar names?

ericdrosas87 commented 11 months ago

are you saying that they should be calling the package lsst-daf-butler in this header section instead of lsst.daf.butler?

Not necessarily, I was just providing context. I suggest checking how other notebooks reference the Butler package and do likewise.

Also, I agree - it would be great to have either lsst or rubin as the leading part of our package name. Right now it seems somewhat disconnected because it doesn't have that. This is a subtle thing though, so maybe not immediately important?

Okay, that sounds good to me. I'll see what the SQRE team has to say about the package namespace.

ericdrosas87 commented 11 months ago

Yeah, I was also going to ask who I should reach out to if I have questions about the standards. Melissa is the go to on this?

I would check with @clareh or @bnord, but my guess is she would be the one to ask.

ericdrosas87 commented 11 months ago

I think we probably don't need 'core_pipeline' in the name? Also it might be confusing why citsci_notebook_core_pipeline is different than rubin_citsci_core_pipeline because they have such similar names?

Yeah I think the core_pipeline part can be dropped - I didn't realize the package name would have this level of visibility when I initially named it.

ericdrosas87 commented 11 months ago

@beckynevin I chatted a bit with Jonathan Sick and Frossie from the SQuaRE team just now and it seems that this discussion was serendipitously timed. Apparently just yesterday there was a SQuaRE discussion around using the rubin. root namespace for Python packages installable via pip. Also, they advised that using a shortened import namespace such as rubin.citsci is now preferable to something like rubin.epo.citsci.

In practice, this would look something like this:

from rubin.citsci import CitSciPipeline

Which seems much cleaner - thoughts?

beckynevin commented 11 months ago

are you saying that they should be calling the package lsst-daf-butler in this header section instead of lsst.daf.butler?

Not necessarily, I was just providing context. I suggest checking how other notebooks reference the Butler package and do likewise.

Strangely, the example notebooks do reference it as lsst.daf.butler in the "packages" section.

Also, I agree - it would be great to have either lsst or rubin as the leading part of our package name. Right now it seems somewhat disconnected because it doesn't have that. This is a subtle thing though, so maybe not immediately important?

Okay, that sounds good to me. I'll see what the SQRE team has to say about the package namespace.

ericdrosas87 commented 11 months ago

I had to create a wholly new PyPI project to "rename" it, but I did so and also updated the Testing notebook with #65

ericdrosas87 commented 11 months ago

The second cell of the Testing notebook now looks like:

email = "" # Email associated with Zooniverse account 
slug_name = "" # Do not include the leading forward-slash, see above 

from rubin.citsci import pipeline
print("Loading and running utilities to establish a link with Zooniverse")
print("Enter your Zooniverse username followed by password below")
cit_sci_pipeline = pipeline.CitSciPipeline()
cit_sci_pipeline.login_to_zooniverse(slug_name, email)
ericdrosas87 commented 11 months ago

That is to say that the header information of the notebooks can reference rubin.citsci as a package dependency now.

beckynevin commented 11 months ago

@ericdrosas87 and @jsv1206 I'm running the citsci notebook with flake8 enabled and I think I'm getting some errors related to flake8 from the pipeline. Do we need access to rubin.citsci in order to correct these errors?

image
beckynevin commented 11 months ago

Bump on the above message @ericdrosas87 , mostly wondering if flake8 is triggering on the pipeline.CitSciPipeline() call.

ericdrosas87 commented 11 months ago

Bump on the above message @ericdrosas87 , mostly wondering if flake8 is triggering on the pipeline.CitSciPipeline() call.

It looks to me like it's referring to that cell specifically and not the citSci PyPI package.

The format for the numbers is: row:column or line:character

For example, the first INFO message is stating that - according to the Flake8 standards - inline comments should be separated by two spaces, and the first line has an inline comment separated by a single space that starts right at character/column 31.

beckynevin commented 11 months ago

flake8 just totally freaked out and now won't load so I've just deleted that cell for now. I've gone ahead and merged the branch, so for now the only outstanding issues are I didn't do a final run through of flake8 because it was seizing up; I figure this can be its own issue down the line.