pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
91 stars 36 forks source link

BioCypher submission #110

Closed slobentanzer closed 8 months ago

slobentanzer commented 1 year ago

Submitting Author: Sebastian Lobentanzer (@slobentanzer) All current maintainers: (@slobentanzer) Package Name: biocypher One-Line Description of Package: framework for creating biomedical knowledge graphs Repository Link: https://github.com/biocypher/biocypher Version submitted: 0.5.14 Editor: @arianesasso
Reviewer 1: @Zethson Reviewer 2: @Midnighter Reviewer 3: Sinna Archive: DOI Version accepted: v0.5.33 JOSS DOI: N/A Date accepted (month/day/year): 11/14/2023


Code of Conduct & Commitment to Maintain Package

Description

Scope

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication Options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Confirm each of the following by checking the box.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

lwasser commented 1 year ago

hi @slobentanzer thank you for this submission and welcome to pyOpenSci! I just wanted to say hello 👋 and to let you know that we see this submission. We will get back to you with the start of our initial checks next week! In the meantime, have a wonderful weekend. we will be in touch. ✨

NickleDave commented 1 year ago

Welcome to pyOpenSci @slobentanzer and thank you for your patience.

I'm happy to report that biocypher passes almost all of the initial editor checks.

There is one thing we need you to take care of before we start the review:

  • [ ] README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.

Please add the following to the README:

Please let me know once you have addressed this requirement.

The filled-out template with checks follows.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.



Editor comments

None of these are required but are just things I noted during the checks:

slobentanzer commented 1 year ago

Hi @NickleDave, thanks for the feedback, I added the requested details to the README.

Regarding your comments, here's my opinion:

NickleDave commented 1 year ago

Thank you @slobentanzer for adding those details to the README. Looks great!

What is the benefit of having it in the README, if users can just read the docs?

I hear you. I also do not like the idea of duplicating content in the README that can go out of sync.
One way around this is to have the single source of truth be your README, and then get that into your docs by using an include directive in RST or in MyST

What is the benefit of having it in the README, if users can just read the docs?

There is an argument to be made that developers hyper-fixate on "Awesome" READMEs, mainly to impress other developers. Still, you don't want to lose a user because they found your GitHub repo first, and they couldn't figure out what your project really does, so they gave up without bothering to read the docs. What is the killer feature and what does it look like in code? If you can express that in one sentence and a single snippet with a few lines of code, at the top of both your README and on the index page of your docs, then you know you're making the best case for your package to new users. Of course that's not possible for every package, but seems like a good general goal to aim for.

I always use pip inside conda envs and did not know this was unusual. Happy for suggestions as to how to make that section clearer or more aligned with good practise.

I am not a conda expert but my understanding is that it's good practice to prefer conda install over pip install, to be able to reproduce environments. Basically because pip does things that conda doesn't do, or that conda doesn't know about. Maybe @jjhelmus can let us know if the advice in this blog post is still up-to-date:
https://www.anaconda.com/blog/using-pip-in-a-conda-environment

I am new to Python and find logging quite confusing; happy for suggestions (or a pointer to a best practise resource) here as well.

I am less new to Python but I still find logging confusing 😭

I'd suggest a quote from Hitchhiker's guide to Python is relevant here:

the user, not the library, should dictate what happens when a logging event occurs

https://docs.python-guide.org/writing/logging/#logging-in-a-library

For that reason I would avoid forcing a logging file and messages on your users.

I have looked several times but have never found a guide to logging that I felt answered my questions.

You probably know these already but:

The main question I had as a library developer is how to use the logger in my modules, and how should I configure it. Through trial and error I have found it's "good enough" practice to make a new logger instance in each module, assigning it __name__ as its name (as you will read in most guides). I have also found that if I do want to configure those logger instances, it's easiest to achieve this through a function I call that checks if there are already existing Handlers attached, so that I don't stomp on a user's existing configuration. See here: https://github.com/vocalpy/vak/pull/535

In my experience is that most scientific Python libraries do not include logging. There are exceptions I can think of:

You might find reading the code of those libraries helpful

NickleDave commented 1 year ago

Hope I did not just overwhelm you with advice.
I will find an editor and we will proceed with the review.
I know I got off to a bit of a slow start--thank you for your patience.

NickleDave commented 1 year ago

Hi again @slobentanzer, happy to inform you that @arianesasso will be editor for this review.
We are seeking appropriate reviewers now and she will tag them in once we have found them.

arianesasso commented 1 year ago

Hello @slobentanzer 👋🏼 . Thanks for the intro @NickleDave :). It is my pleasure to review BioCypher! We are now in the process of finding reviewers for the package. Hopefully, depending on their response, this should take one to two weeks. I will keep you posted on this thread, and if you have any questions, feel free to reach out!

arianesasso commented 1 year ago

Dear @slobentanzer, finding reviewers is taking a bit longer than expected. I hope that is ok. We might have someone already but he would only be able to start the review in a few weeks. Would that be acceptable?

slobentanzer commented 1 year ago

Hi @arianesasso, sure, I have no immediate time pressures.

arianesasso commented 1 year ago

@slobentanzer Great! Sorry for the delay, but now we have reviewers for BioCypher 🥳 .

@Zethson will be our first reviewer and should be starting his review soon.

@Zethson Thank you so much for your contribution! Please, check our reviewer's guide, the reviewer template, and the Python packaging standards detailed in the packaging guide. Also, please reach out in case you have any questions. Happy reviewing!

arianesasso commented 1 year ago

Dear @slobentanzer, we now have our second reviewers 🥳 !

@Midnighter will be the second reviewer, together with his colleague Sinna. She will cover the domain knowledge while he will look more on the Python side. We ask you for a bit of patience since now people are mainly on Summer vacation, so the review might take a bit longer.

@Midnighter Thank you so much to both of you for agreeing to review BioCypher! Please, check our reviewer's guide, the reviewer template, and the Python packaging standards detailed in the packaging guide. Also, please don't hesitate to reach out if you have any questions. Happy reviewing!

arianesasso commented 1 year ago

@Zethson @Midnighter (and Sinna), before starting to review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit here as a comment once your review is complete.

Please get in touch with any questions or concerns! Your review is due: 31 August 2023.


Reviewers: @Zethson and @Midnighter (and Sinna) Due date: 31 August 2023

Zethson commented 1 year ago

Completed the survey.

slobentanzer commented 1 year ago

Thanks @Zethson and @Midnighter / Sinna for reviewing! Looking forward to your comments. :)

Midnighter commented 1 year ago

FYI I will be on vacation until Aug 8. So I won't be active until then.

Zethson commented 1 year ago

Lukas (zethson) review

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

Hey @slobentanzer !

Really amazing work with Biocypher and it really shows that you're trying hard and succeeding with creating a package that is maintainable, well documented and works. I have a few random comments that you can consider addressing:

  1. https://biocypher.org/tutorial.html suggests to install biocypher from source with poetry. I think that this is "noise" here and that all tutorials should always work with the latest version on pypi.
  2. Nit: On the tutorial site terms like "DBMS instance" are not introduced.
  3. Am I wrong or are the Pandas versions of the tutorials not rendered?
  4. API documentation is somewhat "hidden" under "submodule documentation". "API documentation" or something of that sort is probably more common. I also feel like it mostly shows private functions (based on underscores) but not really public ones?
  5. I'm having trouble finding a CI that runs all tests. Am I looking at the wrong places?

Thank you very much!

slobentanzer commented 1 year ago

Hi @Zethson, huge thanks for taking the time to review the package! It is accurate. I will try to address the missing points and comments as time allows. Here are some explanations, feel free to suggest ways to improve the current state:

  1. The reason we recommend to clone and install the package locally is not because of code dependence, but rather because the tutorial files are included in the repo, but not the package. Do you (or another person reading this thread) have another solution for making the tutorial code available to the user in the simplest way?
  2. Will go over the docs and explain all abbreviations soon.
  3. For me it is. (The first couple functions for setup do not have much of an output.) Could be something to do with how the colab works (each user has their own version, maybe)? I don't usually work with colab, so feel free to suggest improvements.
  4. I am using the Sphinx autodoc module for rendering the function documentation. There is already an issue for making it work for all modules, but I have not had time to delve deep into Sphinx functionality to figure this out. We do only have a fairly small amount of user-facing functions in the first place, which are explained in the tutorials. I could maybe add a manual explanation of the most important functions in quickstart, but that would not be automatic any more.
  5. You are right, there is no action for CI. I have been doing it manually so far due to lack of time to establish a workflow, but I agree that this takes high priority. One issue is that some tests depend on DBMSs running on the test machine, so this is not a straightforward setup. It is on the list.
Zethson commented 1 year ago

Hi @Zethson, huge thanks for taking the time to review the package! It is accurate. I will try to address the missing points and comments as time allows. Here are some explanations, feel free to suggest ways to improve the current state:

1. The reason we recommend to clone and install the package locally is not because of code dependence, but rather because the tutorial files are included in the repo, but not the package. Do you (or another person reading this thread) have another solution for making the tutorial code available to the user in the simplest way?

I'd move all tutorials into notebooks and add them to a biocypher-tutorials repository. The tutorials themselves can then download data as needed. Ideally, you could also offer colab/binder instances to run these notebooks then. This may require custom containers and will generally require substantial effort. You could also start by adding "download notebook" buttons to the tutorials.

3. For me it is. (The first couple functions for setup do not have much of an output.) Could be something to do with how the colab works (each user has their own version, maybe)? I don't usually work with colab, so feel free to suggest improvements.

I'll look again.

4. I am using the Sphinx autodoc module for rendering the function documentation. There is already an [issue](https://github.com/biocypher/biocypher/issues/183) for making it work for all modules, but I have not had time to delve deep into Sphinx functionality to figure this out. We do only have a fairly small amount of user-facing functions in the first place, which are explained in the tutorials. I could maybe add a manual explanation of the most important functions in [quickstart](https://github.com/biocypher/biocypher/issues/219), but that would not be automatic any more.

Yeah, you're explaining them in the tutorials, but API documentation for all public functions is imperative IMO.

5. You are right, there is no action for CI. I have been doing it manually so far due to lack of time to establish a workflow, but I agree that this takes high priority. One issue is that some tests depend on DBMSs running on the test machine, so this is not a straightforward setup. It is on [the list](https://github.com/biocypher/biocypher/issues/218).

I can totally see how this is a lot of work, but I really feel like it will be time well spent. This will heavily improve maintainability and the ability for other people to contribute.

Thanks!

slobentanzer commented 1 year ago

Thanks for the feedback, will address and update here. :)

arianesasso commented 1 year ago

Thank you for your thoughtful review @Zethson and your fast replies @slobentanzer! Let me know in case there is anything you need support with.

@Midnighter No worries, when you are available, just start your review, and don't hesitate to contact me as well!

slobentanzer commented 1 year ago

@arianesasso there seems to be a mistake in the initial issue comment; I am listed as reviewer 1 instead of @Zethson

slobentanzer commented 1 year ago

I have added a separate milestone to track and address the review feedback: https://github.com/biocypher/biocypher/milestone/6

arianesasso commented 1 year ago

@arianesasso there seems to be a mistake in the initial issue comment; I am listed as reviewer 1 instead of @Zethson

Thanks for letting me know, my mistake! I will update the issue comment :).

arianesasso commented 12 months ago

@slobentanzer 👋🏼 How is it going with the revisions? Is there anything we can do to support you further?

slobentanzer commented 12 months ago

Hi @arianesasso, as mentioned, you can see the progress of the revision in the milestone (most things from @Zethson are addressed already). I am waiting for the second review before doing more.

arianesasso commented 12 months ago

@slobentanzer Sounds great :). The second review should start this week when @Midnighter comes back from vacation. If you need anything in the meantime, please let me know.

Midnighter commented 11 months ago

Hi everyone,

Just to update you, that we started going through the tutorials and looking at the codebase, but won't be able to complete the review today. Our plan is to do so on Friday.

arianesasso commented 11 months ago

Thank you for the update @Midnighter ! I will change the deadline then :). Please let me know in case you have any questions. Also, please answer our survey in case you haven’t.

Midnighter commented 11 months ago

Moritz & Sinna's Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

Major

We see the major value added by biocypher in providing a tool to manage knowledge graphs and hosting a repository of adapters to quickly import data from a range of well-known sources. In general, the documentation is well written and makes the topic of knowledge graphs and ontologies accessible. Even though it is beyond the scope of the biocypher tool itself, we feel that it would be beneficial if the documentation also provided simple examples of how a knowledge graph can then be exploited in different applications, and not only examples of its main purpose to ingest data sources.

Minor

We have inserted a few minor comments in italic inbetween the checklists above where they fit the context tightly. Below, we list some more that are not directly related to the checklists.

  1. Installation instructions in the README are short, but longer instructions are available in the full documentation, which is the way we personally prefer it, but it clashes with the pyOpenSci guidelines, which prefer a more extensive README.
  2. Neither the online documentation, the CONTRIBUTING.md, nor the DEVELOPER.md guide mention that poetry installs optional development dependencies. This will be important to know if contributors want to add further dependencies (they should add them to the correct group).
  3. The DEVELOPER.md mentions that the project uses napoleon as the documentation format. That doesn't make sense. Napoleon is a tool to parse docstrings written in the Google or numpy style, so that should be changed.
  4. We recommend formatting the badges as a table. There are already quite many.
  5. The basic tutorial suggests to run poetry run python tutorial/01__basic_import.py. This cannot currently work as the line

    from tutorial.data_generator import Protein

    will fail. The tutorial directory will require an __init__.py module and it would need to be on the PYTHONPATH. Alternatively, users could be required to change the working directory into tutorial and the scripts are adjusted to import the data_generator directly.

  6. The basic tutorial mentions a method bc.log_missing_bl_types() # show input unaccounted for in the schema. This method no longer exists on the BioCypher class. We recommend to execute code in documentation in order to discover such problems quickly.
  7. We feel that it would improve the examples, if the data generator created description fields with actual text (like lorem ipsum), rather than just random letters.
  8. While the code formatting is fine style-wise, in particular the tests make ample use of single letter variables. This is against the pep8 guidelines and current best practices in general. We recommend using descriptive variable names everywhere.
  9. The tests directory contains an __init__.py module. Pytest doesn't need those and it should be removed.
  10. Some of the tests declare module level globals. It would be better to use fixtures instead and set the appropriate scope for them.
  11. The CI workflow definition creates a postgres container with the latest tag. Are you sure that you don't want to depend on a specific version as you do for neo4j?
  12. The CI workflow checkout and Python setup actions are outdated.
  13. You could publish the package to PyPI via a workflow as well. See our taxpasta workflow for an example of how to do so.
  14. Running the tests on postgres requires not only a running database, but also the psql command line tool. I did not see this in the documentation. In general, the section https://biocypher.org/installation.html#for-developers has detailed instructions for neo4j, but not for postgres.
  15. Having the contribution guidelines as part of the main docs in addition to the DEVELOPER.md file would be very helpful, too!
arianesasso commented 11 months ago

Thank you @Midnighter and Sinna for the great review! 😄

@slobentanzer would two weeks be a good timeline for you to address the reviews, or is it too tight? Also, let us know in case you have any general or specific questions about them :).

slobentanzer commented 11 months ago

@Midnighter and Sinna, thanks a lot for your time and thoughtful comments. I agree with your review, and have added corresponding issues to the review milestone.

@arianesasso the coming weeks are pretty full of deadlines, I am not sure I will be able to address all points in 2 weeks. End of September seems more realistic, but I will update here in any case (plus any questions that may arise).

arianesasso commented 11 months ago

@slobentanzer thank you for the update and no worries at all :). If you need anything in the meantime, just let me know.

arianesasso commented 10 months ago

@slobentanzer Just wanted to check-in to see how things are going and if you need any help :).

arianesasso commented 10 months ago

Heeey @slobentanzer, just checking in to see how things are going. Is there any way we can support you moving forward?

slobentanzer commented 10 months ago

Hi @arianesasso, thanks for checking in. Not sure what you mean by support, can you elaborate?

Other than that, you can still track the progress of the review implementations in the milestone I mentioned above. I have onboarded @nilskre, who is helping with the proceedings while getting familiar with the framework.

arianesasso commented 10 months ago

Thanks for the updates!

I was just wondering if you had any open questions you would like to ask me and/or the reviewers :). Also, about the deadline, does the end of September still work for you?

slobentanzer commented 10 months ago

It depends a bit on our speed, could go into October. We are dedicating most of our available development time in BioCypher to this review.

slobentanzer commented 9 months ago

Hi @arianesasso, we have now completed all of the issues I created in response to the reviewer comments. For an overview, you can refer to this milestone, which also functions as a response of sorts.

How do we proceed from here?

arianesasso commented 9 months ago

Hey @slobentanzer, I just checked the link and it sounds like you addressed all the reviewers' comments! I changed the status of your submission and now we need @Midnighter and @Zethson to go over their checklists again and confirm that all the points were addressed. You are almost there 🥳 . Thank you for all the hard work!

Zethson commented 9 months ago

Great! I confirm that all points were addressed and recommend approving this package

slobentanzer commented 9 months ago

Thanks for the great input, @Zethson!

Midnighter commented 9 months ago

Can confirm that all our comments regarding the code project have been addressed. Still think it'd be motivating for potential users unfamiliar with or new to knowledge graphs to see some usage scenarios, but we can understand that you consider it too much right now.

We recommend approval.

slobentanzer commented 9 months ago

@Midnighter and Sinna, thanks a lot for your feedback! We agree that use cases are very important, and we will definitely be updating the docs regularly with more applied scenarios, as time permits. :)

NickleDave commented 8 months ago

Hi all :wave: I'm here to help our excellent editor @arianesasso who had a family emergency.

Thank you @Midnighter and @Zethson for confirming your comments were addressed, and letting us know you recommend approval.

@slobentanzer I am happy to report that 🎉 BioCypher has been approved by pyOpenSci! Thank you for submitting! 🎉

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

Editor Final Checks

Please complete the final steps to wrap up this review. Editor, please do the following:


If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the

slobentanzer commented 8 months ago

Thanks a lot, @NickleDave! I have activated Zenodo, added the bade, and filled out the form. Thanks for everything to everybody who helped in this review! :)

NickleDave commented 8 months ago

Thank you @slobentanzer and congratulations again! 🎉

I have announced it on our Discourse too.

If you are interested, we'd love to help you spread the word about BioCypher with a blog post on pyOpenSci's site.

@Midnighter and @Zethson huge thank you again for your work as reviewers. Please if you have a few minutes it would also help us if you could fill out the post-review survey.

And thanks of course to @arianesasso for being the editor for this one. I'll go ahead and close this issue.

slobentanzer commented 8 months ago

Hi @NickleDave, thanks again (also to all), and happy to help with a blog post if it should be required.

NickleDave commented 8 months ago

Definitely not required @slobentanzer, we just want to give you a venue to share BioCypher. Sometimes a blog post can be a way to talk about the package that's not as clear from docs and papers. I'll invite you to our Slack team too if you'd like to join there and ask questions.

slobentanzer commented 8 months ago

Sure, sounds great. Thanks!