eriknw commented 1 year ago

Submitting Author: Erik Welch (@eriknw) All current maintainers: (@eriknw, @jim22k, @SultanOrazbayev) Package Name: Python-graphblas One-Line Description of Package: Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics Repository Link: https://github.com/python-graphblas/python-graphblas Version submitted: 2023.1.0 Editor: @tomalrussell Reviewer 1: @sneakers-the-rat Reviewer 2: @szhorvat Archive:
JOSS DOI: N/A Version accepted: 2023.7.0 Date accepted (month/day/year): 07/14/2023

Description

Python-graphblas is like a faster, more capable scipy.sparse that can implement NetworkX. It is a Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics. Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic.

Scope

Please indicate which category or categories this package falls under:
- [ ] Data retrieval
- [ ] Data extraction
- [X] Data munging
- [ ] Data deposition
- [ ] Reproducibility
- [ ] Geospatial
- [ ] Education
- [ ] Data visualization*
- [X] Scientific software wrappers (added from here)

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?

Audience: anybody who works with sparse data or graphs. We are also implementing a backend to NetworkX (which supports dispatching in version 3.0) written in Python-graphblas called graphblas-algorithms, so we are quite literally targeting NetworkX users!

Python-graphblas provides a faster, easier, more flexible, and more scalable way to operate on sparse data, including for graph algorithms. There are too many scientific applications to list ranging from neuroscience, genomics, biology, etc. It may be useful wherever scipy.sparse or NetworkX are used. Although GraphBLAS was designed to build graph algorithms, it is flexible enough to be used in other applications. Anecdotally, most of our current users that I know about are from research groups in universities and laboratories.

We are also targeting applications that need very large distributed graphs. We have experimented with Dask-ifying python-graphblas here, and we get regular interest from people who want e.g. distributed PageRank or connected components.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

pygraphblas, which hasn't been updated in more than 16 months. There are many differences in syntax, functionality, philosophy, architecture, and (I would argue) robustness and maturity. python-graphblas syntax targets the math syntax, whereas pygraphblas is much closer to C. python-graphblas handles dtypes much more robustly, has efficient conversions to/from numpy and other formats, is architected to handle additional GraphBLAS implementations (more are on the way!), has exceptional error messages, has many more tests and functionality, supports Windows, and much, much more. We have also been growing our team, because sustainability is very important to us.

Although we have/had irreconcilable differences (which is why we decided to create python-graphblas), the authors have always been cordial. We all believe strongly in the ethos of open source, and I would describe our relationship as having "radical generosity". For example, we have an outstanding agreement that each library is welcome to "borrow" from the other (with credit). We may "borrow" some of their documentation :)

We also worked together to create and maintain the C binding to SuiteSparse:GraphBLAS: https://github.com/GraphBLAS/python-suitesparse-graphblas/ We could use help automatically generating wheels for this library on major platforms via cibuildwheel.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Limited prior discussion in this issue: https://github.com/pyOpenSci/python-package-guide/issues/21#issuecomment-1368046000

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has an OSI approved license.
[X] contains a README with instructions for installing the development version.
[ ] includes documentation with examples for all functions.
- We're working on this! The C API and SuiteSparse:GraphBLAS C library are both well documented. We have a very large API surface area to cover, so "documentation with examples for all functions" is a really, really high bar, but one I hope we achieve someday :)
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:
- Undecided. We don't have a paper, but in principle I would like for us to submit a paper to JOSS someday.

JOSS Checks

- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

[x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

[X] I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Other comments (manually added)

Given a product mindset, we believe that Python-graphblas is a great product, but I think our go-to-market strategy has been lacking. We have been very engineering-heavy, and even our goal of targeting NetworkX users is engineering-heavy via creating graphblas-algorithms. I hope this peer-review process can help us prioritize our efforts (such as a plan to improve documentation) as well as a place to write a blog post or two.

Please fill out our survey

[x] Last but not least please fill out our pre-review survey. This helps us track submission and improve our peer review process. We will also ask our reviewers and editors to fill this out.

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

NickleDave commented 1 year ago

Hi @eriknw!

We're very glad to see you all have gone ahead with a full submission, after discussion in https://github.com/pyOpenSci/python-package-guide/issues/21#issuecomment-1368046000 as you linked above.

I just want to welcome you and let you know we are working on this.

I will finish the initial editor checks by the end of this week. @lwasser is traveling and we will need to co-ordinate about editors, but I expect to get back to you about that by the middle of next week at the latest.

Thank you for providing all the detail in the submission. The context you've provided will be helpful for the review (and will definitely help me with editor checks!). It sounds like you've anticipated some points @lwasser brought up when we discussed in Slack. Looking forward to helping you improve the docs and giving you some blog post material! :grin:

NickleDave commented 1 year ago

Editor in Chief checks

These are the basic checks that the package needs to pass to begin review. Please check our Python packaging guide for more information on the elements below.

[x] Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
- [x] The package imports properly into a standard Python environment import package-name.
- was able to do both pip install python-graphbals in a python 3.10 venv on Linux
- and install with conda on Mac OS, python 3.10
[x] Fit The package meets criteria for fit and overlap.
[x] Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
- [x] User-facing documentation that overviews how to install and start using the package.
- [x] Short tutorials that help a user understand how to use the package and what it can do for them.
- but see comments below
- [x] API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
[x] Core GitHub repository Files
- [x] README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
- but see comments below
- [x] Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
- [x] Code of Conduct The package has a Code of Conduct file.
- [x] License The package has an OSI approved license.
- Apache 2.0 NOTE: We prefer that you have development instructions in your documentation too.
[x] Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
[x] Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
[x] Repository The repository link resolves correctly.
[x] Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

[ ] Initial onboarding survey was filled out We appreciate each maintainer of the package filling out this survey individually. :raised_hands: Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. :raised_hands:

Editor comments

This package passes all checks; we can begin review.

It's obvious the developers and maintainers have done a ton of work.
Our goal here should be to help make sure everyone can appreciate how much work they have done, and how much functionality is packed into python-graphblas.

Along those lines, some notes for the review (none of these need to be addressed before we start):

[ ] the description section of the README needs a high level, non-technical description so that the use of the package would be clear to anyone with a general scientific background
- Specifically there should be language that defines sparse data - just a simple definition like "data with lots of 0's "and then link to a good description of sparse data from scipy or some other resource
- Language the submitting authors use above is very close to a good high level description: "python-graphblas ... is a Python library for ... high-performance sparse linear algebra for scalable graph analytics. [Think of it like a faster, more capable scipy.sparse that can implement NetworkX.] Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic." This could be the first thing someone reads both in the README description and on the index of the docs
[ ] the description section of the README should also clarify why would someone use python-graphblas in place of, or in conjunction, with the scipy sparse module and/or networkx, but in very easy to understand language
[ ] ideally the README itself should have a short, beginning to end tutorial to get someone started, that really demonstrates the power of the package
- for example this snippet appears to be almost stand-alone, and the set-up lines will be understandable by most scientific Python users: https://python-graphblas.readthedocs.io/en/latest/getting_started/primer.html#sssp-in-python-graphblas
- as opposed to the examples that are currently in the README that are meant to give an impression of the package but are not stand-alone runnable snippets; these examples may be useful in API documentation. But the README as currently written may be missing the opportunity to recruit users without a quick friendly walkthrough
[ ] as noted by submitting author, there is an existing tutorial, the python-graphblas primer, but there could be additional vignettes;
- what are typical use cases, perhaps analyses from papers or packages that already use python-graphblas could be added in a how-tos section of the docs? Submitting authors state there may be examples in pygraphblas documentation

NickleDave commented 1 year ago

@eriknw @jim22k, @SultanOrazbayev the tl;dr is that python-graphblas passed editor checks 🙂 🎉

Like I said above, I'll need to co-ordinate with @lwasser who is traveling about an editor for this review, but would expect us to reply back here by middle of next week at the latest

eriknw commented 1 year ago

Hooray! 🎉

Thanks @NickleDave. We appreciate the attention y'all are giving us, and thanks for telling us what (and when) to expect next. We're in no particular rush--it's more important to give the right people the right amount of time to do things right :)

NickleDave commented 1 year ago

it's more important to give the right people the right amount of time to do things right

🙌

finding the right people now! 🙂

NickleDave commented 1 year ago

Hi again @eriknw, @jim22k, @SultanOrazbayev -- just letting you know that I did have a chance to talk with @lwasser now that she has returned, and we are in the process of finding an editor

When you have a chance could you please (all) fill out the pre-review survey?
It's here: https://forms.gle/F9mou7S3jhe8DMJ16

We appreciate each maintainer of the package filling out this survey individually. 🙌 Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

(I know it's easy to miss in the template)

NickleDave commented 1 year ago

Hi @eriknw, @jim22k, @SultanOrazbayev, brief update:
very happy to inform you that @tomalrussell will be guest editor for this review! 🎉 NetworkX contributor, developer of spatial & network tools like snkit.
I will let @tomalrussell take it from here!

tomalrussell commented 1 year ago

Hi @eriknw, @jim22k, @SultanOrazbayev, and thanks to @NickleDave for the introduction.

I've reached out to potential reviewers, and incidentally look forward to taking a closer look at python-graphblas myself. I'll update here as soon as, definitely within a week.

tomalrussell commented 1 year ago

:wave: Hi @sneakers-the-rat and @szhorvat! Thank you for volunteering to review for pyOpenSci!

The following resources will help you complete your review:

Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due in three weeks: 29th March 2023

szhorvat commented 1 year ago

Hello everyone 👋

I'm excited to do this review and learn more about the GraphBLAS approach in general.

I plan do the review gradually, and through continuous communication with the authors. I will make it clear when I consider the review to be completed. Feel free to respond to anything I might bring up before then. The same applies to the review checklist: I will post it below today, and will check off boxes gradually.

Any issues I open for python-graphblas will have a title prefixed with [pyos] and a link back here. You'll probably see me in the discussion forum as well, as I will likely need a bit of help while trying to solve a few toy problems with the library.

Expect comments mostly on mathematical aspects, correctness, docs, and usability from me. Hopefully other reviewers will cover the more technical aspects of Python.

I will aim to complete the review by April 2nd. Since the authors will have the opportunity to address concerns before then, I hope this little delay over the 3 week deadline will be fine. I will not be available during the week of the 20th.

Let me know if you'd like any changes to this arrangement—I can be flexible.

For transparency, I should note that am involved with the igraph project (https://igraph.org/). igraph is not a competitor to python-graphblas, but it does have similar aims to NetworkX and in extension to graphblas-algorithms. I think this review will be a good opportunity for us to learn form each other.

szhorvat commented 1 year ago

Review is now complete.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
- Improvements needed, see main review. Pure Python users vs C users already familiar with GraphBLAS? Explain difference from graph theory / network analysis packages.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[ ] Vignette(s) demonstrating major functionality that runs successfully locally.
- There are some notebooks, but it is difficult to learn from them. They are sparsely commented, several draw parallels with C code, requiring both C and GraphBLAS knowledge. Discoverability: they are not linked from the docs.
[ ] Function Documentation: for all user-facing functions.
- Not all functions appear in docs, e.g. visualization functions are missing, details of GraphBLAS API is not well covered.
[ ] Examples for all user-facing functions.
- IMO it makes no sense to take this requirement literally, as functions are not independent from each other. Multiple useful examples are present in the user guide, which is more than what some other packages provide. That said, improvements are definitely possible (compare with typical R or Mathematica docs). In particular, more examples showing the implementations of simple well-known algorithms would be welcome.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[ ] Badges for:
- [x] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [ ] A repostatus.org badge,
- [x] Python versions supported,
- [x] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

[x] Short description of package goals.
[x] Package installation instructions
[x] Any additional setup required to use the package (authentication tokens, etc.)
[ ] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [ ] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  - Links to notebooks are not present. It would be useful to have a thoroughly commented walkthrough for implementing a couple of well-known algorithms.
[x] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[x] Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

[ ] Package documentation is clear and easy to find and use.
- Docs are good, but sometimes a bit sparse. There are cases when improvements are necessary, such as having a list of available operators, semirings, or documenting behaviour in more details (e.g. https://github.com/python-graphblas/python-graphblas/issues/475)
[x] The need for the package is clear
[ ] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [?] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Review Comments

szhorvat commented 1 year ago

@tomalrussell Could you please clarify if the authors are expected to follow all the above checklist items to the letter? (E.g., all mentioned badges mandatory, links to all "vignettes" from the README, all functions have examples of use, etc.)

@eriknw :

includes documentation with examples for all functions.

We're working on this! The C API and SuiteSparse:GraphBLAS C library are both well documented. We have a very large API surface area to cover, so "documentation with examples for all functions" is a really, really high bar, but one I hope we achieve someday :)

Indeed, the documentation of most Python projects won't give usage examples for every single function, and doing so would definitely be a lot of work. But just to show that it is possible, and often tremendously useful to users, I wanted to point to Mathematica's documentation where each function has not one, but many examples. See e.g. LinearSolve. I tried to follow the same with my IGraph/M package for Mathematica, but it's still a work in progress. R packages also often have at least one example for each function.

sneakers-the-rat commented 1 year ago

Indeed, the documentation of most Python projects won't give usage examples for every single function, and doing so would definitely be a lot of work.

From what I recall we had a conversation at some point about allowing the author to define what is intended as the public interface of the package and what isn't? but ya for packages that wrap another library it seems like a lot of extra work if, eg. there are examples from the main library that are trivially different (ie. could be inferred) from the wrapper's API.

sneakers-the-rat commented 1 year ago

I'll also be doing this JOSS-style, leaving this here and editing/raising issues as I go. I like @szhorvat 's idea of using an issue tag, so i'll also prefix mine with [pyos].

I'm happy to focus on more of the python implementation side of things, glad to have someone who's more adept with the math :).

I don't have any conflicts to declare, except that I'm going to be writing some triplet store code soon, but that doesn't really relate or create a conflict imo.

I don't have an expected completion date, but i have this in my calendar as a daily todo item and welcome being relentessly pinged if i am the one holding us up :)

Review status

23-03-20: approaching from the docs as a naïve user first, then will dive into the source after that first pass. raising some issues for docs clarity and organization. Pausing midway through the README requirements for the day.
23-04-09: making progress working my way through the package and docs, trying to understand the general architecture so that I can evaluate high level claims. definitely a lot of work here and so I'm not trying to "fine tooth comb" things as much as get to a point where I can reasonably explain how the package works for the sake of a review and also to help out with docs if I can. Stopped at the "Operators" section of the user guide after raising issue 429.
23-04-17: I have completed the checklist and gotten enough of a sense of the package where I feel comfortable writing my final review, but will wait to do so until the authors can see some of the issues I raised today and we close some of them (not all are time sensitive or important to me to consider the review completed, i'll communicate with the authors on the issues)
23-06-21: review completed! thanks for your patients <3

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
- This is in a few places:
  - Differentiation from prior packages: https://python-graphblas.readthedocs.io/en/stable/getting_started/faq.html#what-is-the-relationship-between-python-graphblas-and-pygraphblas
  - What graphblas is in general: https://python-graphblas.readthedocs.io/en/stable/getting_started/primer.html
  - More elaboration on the particular API design here: https://python-graphblas.readthedocs.io/en/stable/user_guide/fundamentals.html#update-notation
  - Some description on the need for the approach in the README, but not as its own section.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
- PyPI install seems to work! And the examples in the notebooks run, so i assume the installation is successful
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
- In notebooks folder: https://github.com/python-graphblas/python-graphblas/tree/main/notebooks
- really nice, would like to see these make their wait into the docs and have a bit more scaffolding, but good enough for me.
[ ] Function Documentation: for all user-facing functions.
- This is currently not the case, raised issue: https://github.com/python-graphblas/python-graphblas/issues/410
- 23-04-17: still sorting this out I think, I think the main thing I'd like to see here is a complete listing of all the operator types that are available, where currently there are just a subset listed. This would be sort of critical to have in the API docs I think.
[x] Examples for all user-facing functions.
- Similarly, that is not currently the case, but is imo impractical/not necessarily desirable for this package. What would make more sense to me is conceptual examples that demonstrate the different categories of user-facing functions, and I have yet to evaluate that.
- Really nice examples in the user guide, would like to see some of those make their way into docstrings/API docs, but again don't think that's necessary here because of the structure as a wrapper. Raised some issues on some examples that didn't work for me, but appreciated the amount of work that clearly was done here.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
- Yes, both code of conduct and brief contribution docs, but have raised issue about need for developer docs separately. this requirement is met, tho.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[x] Badges for:
- [x] Continuous integration and test coverage, - 99% coverage, nice lmao.
- [x] Docs building (if you have a documentation website),
- [ ] A repostatus.org badge - not present, but i'm not a huge fan of this granularity of badge requirements.
- [x] Python versions supported,
- [x] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

[x] Short description of package goals. - present, but this could be clearer, the description is written for people who already know what graphBLAS is, getting pretty quickly into the weeds about the mapping between python and C API syntax before I even know what the heck the package is for at a high level. I think it would be good to restructure this in the the README and the docs to be like "Hey what up this is python-graphblas, a wrapper around graphBLAS that lets you do graph math with linear algebra. With it you can do stuff like this... (one or two line example). Previous things work like this (example) but python-graphblas works like this instead (exampel of your syntax) which is good because (two sentences)"
[x] Package installation instructions
[x] Any additional setup required to use the package (authentication tokens, etc.)
[x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file. - There is a link to the docs, which have examples. There should also be a link to the notebooks (even though it is relatively obvious they are there by virtue of there being a notebooks folder in the repo root) in the README.
- [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear) - present but similar to previous comment. The README seems like it has gotten a little bit confused about its role in the documentation and does a lot of things that i would expect to just go in the full on docs. So there are some examples of the syntax, but not an example of actually using the package, which is what I would expect in a README.
[x] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem. - not present in README, but could be done by just linking to https://python-graphblas.readthedocs.io/en/stable/getting_started/faq.html#what-is-the-relationship-between-python-graphblas-and-pygraphblas
- raised issue: https://github.com/python-graphblas/python-graphblas/issues/438
[x] Citation information - there is a DOI and a zenodo repo, but I would also strongly recommend adding a CITATION file: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear - This question appears in a few places in the checklist, and as with the other places the differentiation from this and other packages is present but sort of hidden in the FAQ
[ ] All functions have documentation and associated examples for use - discussion on this point: https://github.com/python-graphblas/python-graphblas/issues/410
[x] The package is easy to install - pip install worked.

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.

In trying to summarize what the functional claims of the software were, I think it basically boils down to

[x] Wraps GraphBLAS - the software does indeed wrap the graphBLAS API, so mission accomplished there as far as I can tell. I did not to an exhaustive 1:1 of graphblas API functionality and whether it's wrapped by the package, but don't think that's necessarily the most important thing to do. Not exactly sure how the version of the API is handled, so opened an issue: https://github.com/python-graphblas/python-graphblas/issues/424 and another on the structure of the code: https://github.com/python-graphblas/python-graphblas/issues/425
[x] Collections - https://python-graphblas.readthedocs.io/en/stable/user_guide/collections.html - the tests (test_matrix, test_vector, test_scalar) test the functionality I would expect here.
[x] dtypes - https://python-graphblas.readthedocs.io/en/stable/user_guide/types.html - dtypes are all mapped to numpy, numba, and ctypes. tests are also good here.
[x] operators - after https://github.com/python-graphblas/python-graphblas/issues/429 I understand the implementation better and think the developers know how to improve the documentation to make this easier for future ppl to figure out. I like the descriptions of the different operators in the docs, make it pretty clear what's going on. would recommend linking out to examples here. That would be easiest to do by putting the examples in the operator docstrings so they can be defined in one place - "see the class documentation for this operator" which you can then be confident has examples/etc. without needing to point to a bunch of different places for different kinds of documentation. I like the care that is taken with the infix operators and the op namespace.
[x] operations - some issues with examples in the documentation: https://github.com/python-graphblas/python-graphblas/issues/434 Otherwise looks good
[x] graphblas initialization - works as expected
[x] i/o - networkx, scipy.sparse, pydata sparse, numpy all work as expected. awkward is undocumented ( https://github.com/python-graphblas/python-graphblas/issues/436 ) . All these could really benefit from type annotations since they're traversing types ( https://github.com/python-graphblas/python-graphblas/issues/435 ). Otherwise looks good.
[x] user defined functions - seems to work. raise issue about documenting params ( https://github.com/python-graphblas/python-graphblas/issues/437 )
[x] recorder - this is sick as hell, nice work y'all. I don't know enough about graphBLAS to evaluate the results of the recorder, but it works and is a great addition to the package.
[x] Performance: Any performance claims of the software been confirmed. - the performance claims are very minimal, eg in the FAQ and seem quite reasonable.
[x] Automated tests: - nicely done on these.
- [x] Tests cover essential functions of the package and a reasonable range of inputs and conditions.
- [x] All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass) - way more linting than i would do lol

For packages also submitting to JOSS

N/A

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: (final estimate) ~14h

Review

General Comments

python-graphblas is an exemplary package among scientific python packages, with excellent docs, tests, code quality, examples, and contributor engagement. The package is an opinionated wrapper and interface to GraphBLAS, which is well justified and differentiated from prior wrappers. Throughout the review process, the authors have been receptive to feedback and I have all faith that they will address any continuing suggestions I have in future development work. I have no hesitation recommending this package to anyone who wants to use GraphBLAS from Python.

My outstanding recommendations in the remaining open issues are all future suggestions that the authors can take or leave, none are mission-critical. I want to compliment the authors on this excellent work, I'm glad to have had a reason to have read it. I would be happy to respond to any questions the authors have about this review and otherwise continue to engage on open issues.

Code Quality

Library wrappers have their own sets of challenges and idioms, and python-graphblas handles them well with some room for future improvement. Wrapping code needs to make decisions about how to map the underlying API between languages and interact with the C library. The strategy used to abstract and expose the underlying library API can have large impacts on the maintainability and readability of the code, as can strategies for keeping the version of the wrapping package in sync with the changing API of the wrapped library.

python-graphblas further complicates these choices by admirably taking on the challenge of having multiple backend implementations (numpy and SuiteSparse:GraphBlas), and multiple export formats (NetworkX, Scipy.Sparse, PyData.Sparse, numpy, SuiteSparse, Matrix Market).

The python-graphblas team has chosen a bifurcated abstraction style which fits with the design of their wrapper. Rather than a transparent wrapper, the package introduces its own calling syntax to logically separate the i/o parts of a GraphBLAS call from the computation, which is well described and justified in the documentation.

At the time of reviewing,

the basic container types are implmented as classes within the core module (eg core.vector.Vector)
operators implemented as separate modules within the main module, each with separate submodules for numpy and suitesparse implementations (eg. graphblas.unary)
- ss implementations are actually implemented in the core.operator module, which then makes the symbols available in the top-level module namespace as a side effect of importing, eg. (graphblas.unary.ss)
the core module also implements the other syntax elements like infix operators that are used throughout the package.

This structure makes for a nice user-facing interface, being able to refer to classes and methods in a natural way (eg. gb.Vector, gb.unary.abs), obscuring the underlying code structure.

The tradeoff is the significant amount of indirection and implicitness in the code which presents a relatively high barrier to entry for new contributors. The mappings between the suitesparse GraphBLAS implementation and Python are programmed in several places, eg. within a method and then again in the tests, which the authors describe as being useful as a consistency check. Having some parts of the library dependent on import time side-effects is less than optimal and makes the code more difficult to reason about, but is certainly not fatal to the usability of the package. Again, the nature of a wrapping package requires making decisions about abstraction, so the only concern I have for the current code structure is the impact on maintainability. The current maintainers seem to have no trouble reasoning about the package, and I believe they are aware of these challenges and are actively working on them (they refactored a formerly massive operator file during the course of the review (https://github.com/python-graphblas/python-graphblas/pull/420)). The maintainers can do themselves and future contributors a favor by writing some additional developer docs that explain the structure of the library in greater detail, and I believe they will!

The other question I had was about the relationship between versioning in the API implementation in Python and in the underlying C library. GraphBLAS is a relatively mature API and seems to be only rarely changed, with care for backwards compatibility, so this is less of a concern than for a wrapper around a more actively evolving API. The authors have chosen to formally support a specific version of GraphBLAS ( https://github.com/python-graphblas/python-graphblas/pull/441 ) rather than build version-compatibility infrastructure within the package, which seems like an appropriate decision to me, given their other comments on how refactoring their backend system is on their development roadmap.

I want to also emphasize several of the "nice to have" features in the package, including a very impressive Recorder class that can keep a record of all the GraphBLAS calls made for debugging and reproducibility, automatic compilation of user-supplied functions including lambdas using numba, and the excellent i/o module. These indicate the authors are actively invested in user experience above and beyond the tidy API they expose.

Aside from the above comments, the fundamentals of the package design are strong: modern packaging with setuptools, automated code quality checks, CI, and coverage reports. The decisions made in code structure seem principled, responsive to the constraints of the problem, and result in a very usable user interface - my compliments to the Authors.

Docs

The package is well documented from an introduction to the problem that GraphBLAS attempts to solve through package design decisions and practical examples. It is suitable for a general audience with some exposure to graph operations and programming in python, which is impressive given the highly specialized nature of the library.

Future suggestions for the authors include embedding their example notebooks in the documentation, and improving the API documentation. Currently, I assume due to some of the abstraction decisions made in the rest of the package, there is not comprehensive documentation of every operation available to the user that might otherwise be accomplished with calling autodoc over a class or a module, but some common operators are listed here and the list of available operators are present in the GraphBLAS documentation as well as through object inspection within an interactive Python session. Given the nature of wrapping code where the underlying operations are well documented elsewhere, this is less of a problem than it would be in other packages.

Altogether the docs are excellent with several clear points of improvement, but far above average in the landscape of scientific python packages.

Tests

Who among us can claim 100% test coverage? https://coveralls.io/github/python-graphblas/python-graphblas

The tests are well organized and comprehensive, and I was able to find a corresponding test for every package feature I looked at easily. In more security-sensitive contexts one would want to do more adversarial input fuzzing, but I don't think that's all that relevant here since I've never seen graph data analysis libraries used as a malware vector. I have no notes on the tests, this is good work.

Issues Opened:

tomalrussell commented 1 year ago

Thanks both, great start 💯 - I'll aim to check in occasionally or as needed.

Could you please clarify if the authors are expected to follow all the above checklist items to the letter? (E.g., all mentioned badges mandatory, links to all "vignettes" from the README, all functions have examples of use, etc.)

I would comment on anything you notice and let the authors respond, we can always exercise judgment if "to the letter" seems unhelpful.

lwasser commented 1 year ago

hey y'all. We normally prefer that reviews happen "all at once" in the sense that we're prefer the text of the review to NOT change once submitted and the conversation to happen after. I have reached out to JOSS about how the implement their reviews but I don't want to change our policy on how reviews happen (submitted all at once) until i've spoken to JOSS. @sneakers-the-rat @szhorvat it's fine if you want to leave the review text and check things off and open issues as you go but i prefer that the text of the review that you add to be added all at once to avoid any confusion regarding when you review is complete and what the maintainer of the package should focus on. Many thanks for understanding. I will update once i hear back from JOSS but i don't want to modify our process on the fly until we've thought things through more completely.

many thanks for your time y'all!

lwasser commented 1 year ago

one other note - i think it's great to open issues as you go but one other element that is important is documentation of what changed and why so there is a full record of the review so please if you open issues be sure to reference them in the text of the review in the context of why you opened them. that will allow the editor (and us as an organization) to keep track of the review in one place. again many thanks! we are learning as an organization in this process

sneakers-the-rat commented 1 year ago

fair enough @lwasser :) typically the way it works at JOSS is that you'll be opening issues on the repo and then linking to the review issue (this one) so that basically the review issue serves as a timeline of changes and discussion for the review (all edits to the review checklist comment are also logged). Happy to also do the inverse (link back to opened issues) as well.

So - for this review, will only post text of review when checklist finished, is that what you had in mind?

szhorvat commented 1 year ago

Some of the review checklist items feel a bit R-ish and I wonder if these are intentional, or they just haven't been updated yet after borrowing from rOpenSci. Is the term "vignette" used in the context of Python? Examples for each function are not typical for Python either (though I have to say that in my personal opinion it wouldn't be bad to borrow this habit from R 😉 )

NickleDave commented 1 year ago

I wonder if these are intentional.

Yes. If you prefer the term "tutorial", then feel free to use that.

though I have to say that in my personal opinion it wouldn't be bad to borrow this habit from R

Exactly.

For example see discussion of vignettes here on PyGMT: https://github.com/pyOpenSci/software-submission/issues/43#issuecomment-994023562

Or discussion of examples here on the jointly review: https://github.com/pyOpenSci/software-submission/issues/45#issuecomment-1001172229

NickleDave commented 1 year ago

Happy to also do the inverse (link back to opened issues) as well

@sneakers-the-rat if you can also link in comments that just helps make sure we don't miss it

So - for this review, will only post text of review when checklist finished, is that what you had in mind?

yes please

@szhorvat @sneakers-the-rat the checklists and workflow we have are fairly well established and up to date. They're designed in part so that @lwasser can parse the reviews in an automated way.

We're of course happy to learn from other communities-- @lwasser frequently talks with Arfon from JOSS, for example--but please let's try to follow the workflow we have in place for now.

I realize from how this review has started we probably can do more to make the workflow clearer. We're happy to hear feedback after the review on how we can improve the workflow and make it clearer--we can discuss in Slack or in the forum.

Thank you!

szhorvat commented 1 year ago

@NickleDave Although I did not respond in words (just a 👍 ), I noted the comment from @lwasser and I am happy to follow (I believe I am following) the process. If I missed anything, please point it out explicitly, and I'll do my best to correct it.

NickleDave commented 1 year ago

Thank you @szhorvat maybe I shouldn't comment on GitHub issues when I wake up grumpy :flushed:
I do hope the previous reviews linked will help though--we can probably add links to those in reviewer docs.

lwasser commented 1 year ago

thank you all! @sneakers-the-rat @szhorvat we are super happy to discuss this more (maybe we could do this in discourse as an open spot to discuss? As @NickleDave said, we want to learn from other communities, however it's also important that our review approach is consistent. As such we want to discuss as a community before making changes to our processes for review. I hope that make sense. I am looking forward to chatting more!! And appreciate everyone's time here on this review!

sneakers-the-rat commented 1 year ago

@eriknw - clarification q on your intended audience statement above for the sake of providing feedback on the docs: are you expecting people to have prior experience with graphBLAS? or is the intention to be able to bring someone with ~reasonable understanding of sparse data or graphs up to speed as well? Would be happy to be the latter kind of reader and let you know what parts are clear/unclear if that's the intention, but if the expectation is for someone to already understand graphBLAS i'll spare u.

edit: eg. this - https://python-graphblas.readthedocs.io/en/stable/getting_started/primer.html and this - https://python-graphblas.readthedocs.io/en/stable/user_guide/fundamentals.html#c-to-python-mapping are a little uneven as far as expected expertise imo and i'd be happy to help smooth that out (but i appreciate the work y'all have done on the docs here)

lwasser commented 1 year ago

FWIW - i think that it could be nice if the overview provides enough context for a user less experienced with graphBLAS to follow. it might welcome more users. for instance i was unfamiliar with the term "sparse matrix". then i looked i up and quickly realized i've used that kind of data structure before. so i was familiar with the data type but not the word. of course @eriknw knows best what is feasible here in terms of helping inform users of the package's goals.

my two cents :)

eriknw commented 1 year ago

are you expecting people to have prior experience with graphBLAS? or is the intention to be able to bring someone with ~reasonable understanding of sparse data or graphs up to speed as well?

Yeah, I think we definitely want to support both, and I know there's work to do on the docs, so all feedback is appreciated. It would be great if we had documentation at an even higher level too such as example applications that data scientists can use without even needing to know much about graphs or sparse.

My personal priority right now is to improve maintenance documentation so that maintenance can continue smoothly if I disappeared for a time (for any reason). My second documentation priority is to establish best practices for how to add examples to docstrings. We have other maintainers and contributors who have improved and plan to continue improving documentation.

of course @eriknw knows best what is feasible here in terms of helping inform users of the package's goals.

I think you're giving me (and engineers in general) too much credit! I/we are sometimes much too close to engineering problems to remember the bigger picture of "who cares and why?". I would like for us to pivot to a "product-led growth" mindset/orientation this year now that much of the hard engineering is done.

lwasser commented 1 year ago

@eriknw thank you for this feedback. so it sounds like it might be useful for @sneakers-the-rat @szhorvat to provide feedback from their perspective about what is confusing in the documentation. (such great questions you both asked erik!!) i wonder if there is some way for us to also help you with revisions if you need that support given your focus on maintenance docs (which i agree is profoundly important for long term maintenance). it's something that i've been thinking about but we don't have a huge team just yet to support that just yet...

tomalrussell commented 1 year ago

Hi all - just checking in with @sneakers-the-rat and @szhorvat - thanks for the engagement, discussion and issues opened so far. Are you both on track to submit your initial reviews by the 29th?

(For context - the pyopensci process aims for 3 weeks for review, 2 weeks for subsequent changes, and 1 week for reviewer approval of changes.)

szhorvat commented 1 year ago

I arrived back on Sunday and immediately came down with a bad flu, so I won't be able to finish by tomorrow. If I recover in time, I still hope to finish by Sunday.

tomalrussell commented 1 year ago

Hope you feel better soon @szhorvat! No pressure from my end, just helping to set expectations.

sneakers-the-rat commented 1 year ago

sorry the 29th has come and gone, i've been in a bit of a writing hole, just about to finish a draft of something this week and then was planning on finishing review next week!!!!

sneakers-the-rat commented 1 year ago

sorry again, still writing, just pinging to say i'm still here and that i'm going to put some time in on this right now!

sneakers-the-rat commented 1 year ago

Alright I made it through the checklist, there are some straggling thoughts in the remaining open issues. We can hash those out there and then I'll write my final review :)

szhorvat commented 1 year ago

Apologies about the long disappearance. I have not forgotten about the review. A series of unexpected events prevented me from completing it. I will be able to get back to it on May 1st Monday and hopefully complete it that week.

eriknw commented 1 year ago

I'd like to share most of the PRs that the pyOpenSci reviews have strongly influenced:

Since the review began, we have also:

automated wheel building of the upstream dependency that wraps the C library.
- It supports all architectures/platforms/os/python versions that numpy supports.
- See: https://pypi.org/project/suitesparse-graphblas/7.4.4.1a1/#files
- This was 90% done by an outside contributor
added support for Python 3.11 (numba was finally released for 3.11, but we also just made numba optional)
presented a poster at PyConUS

The reviewers have also indicated where we can/should refactor code (operators!) to make it more understandable and maintainable, and specific suggestions for how to further improve documentation. I expect implementing these changes will keep us busy for most of the year.

@sneakers-the-rat @szhorvat if there are any specific issues you would like us to address before you finish your reviews (or as conditional for acceptance), please let us know and please be specific. There are many issues, checklists, and discussions for us to read that it's hard to know exactly where things stand (but the reviews have been great--thank you!). For example, I think a specific enhancement that y'all probably really, really want is for the operator objects to be listed in the API reference of the documentation (definitely a good idea). Should we add this "quick and ugly" right now, then redo it later in the year after we refactor operators to be more declarative, or just do it later after the refactor?

sneakers-the-rat commented 1 year ago

checking in to say sorry I have been AWOL, have not been feeling well the past several weeks, but am back and work and plan on finishing my review tomorrow or Monday -- I have no outstanding requested changes that need to be immediate (and if it isn't/wasn't clear I don't believe in a review process where it is possible to "fail" the review except by nonparticipation, just to take that idea off the table to the degree that it even applies here)

sneakers-the-rat commented 1 year ago

For example, I think a specific enhancement that y'all probably really, really want is for the operator objects to be listed in the API reference of the documentation (definitely a good idea). Should we add this "quick and ugly" right now, then redo it later in the year after we refactor operators to be more declarative, or just do it later after the refactor?

specifically on this - I would not want to encourage you to do anything that is quick and dirty or otherwise not part of your normal planned development process. I trust y'all at this point to make recommended changes as you are able to, and given that there are several ways to list those functions/methods/classes as is, and the docs are to some degree intertwined with the graphviz docs, I don't think their immediate absence is disqualifying to the "all user-facing functions documented" check.

eriknw commented 1 year ago

Thanks for checking in @sneakers-the-rat, and I hope you (and everyone) are feeling better.

I realize the timing on this review has been a bit... flexible, that March 29 (https://github.com/pyOpenSci/software-submission/issues/81#issuecomment-1458089302) has come and gone, and we're pretty far off the six week schedule that pyopensci aims for (https://github.com/pyOpenSci/software-submission/issues/81#issuecomment-1484897108).

If possible, can we refocus and try to get this review finished-finished before the SciPy conference, July 10th?

We've gotten a lot of value from this review already, and I appreciate all the time and effort the volunteers have spent. Thanks.

tomalrussell commented 1 year ago

Hi @eriknw thanks for your patience, I'm glad you've found value in the process so far, and thanks for the call to action!

@szhorvat can you finalise your review and indicate if you are requesting any further changes in response - https://github.com/pyOpenSci/software-submission/issues/81#issuecomment-1458245662
@sneakers-the-rat can you update the final check to confirm you're happy with the changes made in response to review - https://github.com/pyOpenSci/software-submission/issues/81#issuecomment-1459181121

I'm going to suggest a refocussed timeline, given the process has already had some time for review-in-progress and response:

by 21st June: @szhorvat and @sneakers-the-rat to finalise reviews
by 28th June: @eriknw and team to respond if required
by 5th July: @szhorvat and @sneakers-the-rat to acknowledge (and all being well, @tomalrussell to accept the package, thanks for the efforts all round so far)

sneakers-the-rat commented 1 year ago

SORRY YES i am still here i am just trying to catch up at work. i can do the 21st for sure. i will shoot for tomorrow.

sneakers-the-rat commented 1 year ago

TODAY is the 21st and i am completing it TODAY

sneakers-the-rat commented 1 year ago

Here are my final review comments, I'll also add them into my review above. Thanks again to the authors for their work here, it was a pleasure reading the package <3

Review

General Comments

python-graphblas is an exemplary package among scientific python packages, with excellent docs, tests, code quality, examples, and contributor engagement. The package is an opinionated wrapper and interface to GraphBLAS, which is well justified and differentiated from prior wrappers. Throughout the review process, the authors have been receptive to feedback and I have all faith that they will address any continuing suggestions I have in future development work. I have no hesitation recommending this package to anyone who wants to use GraphBLAS from Python.

My outstanding recommendations in the remaining open issues are all future suggestions that the authors can take or leave, none are mission-critical. I want to compliment the authors on this excellent work, I'm glad to have had a reason to have read it. I would be happy to respond to any questions the authors have about this review and otherwise continue to engage on open issues.

Code Quality

Library wrappers have their own sets of challenges and idioms, and python-graphblas handles them well with some room for future improvement. Wrapping code needs to make decisions about how to map the underlying API between languages and interact with the C library. The strategy used to abstract and expose the underlying library API can have large impacts on the maintainability and readability of the code, as can strategies for keeping the version of the wrapping package in sync with the changing API of the wrapped library.

python-graphblas further complicates these choices by admirably taking on the challenge of having multiple backend implementations (numpy and SuiteSparse:GraphBlas), and multiple export formats (NetworkX, Scipy.Sparse, PyData.Sparse, numpy, SuiteSparse, Matrix Market).

The python-graphblas team has chosen a bifurcated abstraction style which fits with the design of their wrapper. Rather than a transparent wrapper, the package introduces its own calling syntax to logically separate the i/o parts of a GraphBLAS call from the computation, which is well described and justified in the documentation.

At the time of reviewing,

the basic container types are implmented as classes within the core module (eg core.vector.Vector)
operators implemented as separate modules within the main module, each with separate submodules for numpy and suitesparse implementations (eg. graphblas.unary)
- ss implementations are actually implemented in the core.operator module, which then makes the symbols available in the top-level module namespace as a side effect of importing, eg. (graphblas.unary.ss)
the core module also implements the other syntax elements like infix operators that are used throughout the package.

This structure makes for a nice user-facing interface, being able to refer to classes and methods in a natural way (eg. gb.Vector, gb.unary.abs), obscuring the underlying code structure.

The tradeoff is the significant amount of indirection and implicitness in the code which presents a relatively high barrier to entry for new contributors. The mappings between the suitesparse GraphBLAS implementation and Python are programmed in several places, eg. within a method and then again in the tests, which the authors describe as being useful as a consistency check. Having some parts of the library dependent on import time side-effects is less than optimal and makes the code more difficult to reason about, but is certainly not fatal to the usability of the package. Again, the nature of a wrapping package requires making decisions about abstraction, so the only concern I have for the current code structure is the impact on maintainability. The current maintainers seem to have no trouble reasoning about the package, and I believe they are aware of these challenges and are actively working on them (they refactored a formerly massive operator file during the course of the review (https://github.com/python-graphblas/python-graphblas/pull/420)). The maintainers can do themselves and future contributors a favor by writing some additional developer docs that explain the structure of the library in greater detail, and I believe they will!

The other question I had was about the relationship between versioning in the API implementation in Python and in the underlying C library. GraphBLAS is a relatively mature API and seems to be only rarely changed, with care for backwards compatibility, so this is less of a concern than for a wrapper around a more actively evolving API. The authors have chosen to formally support a specific version of GraphBLAS ( https://github.com/python-graphblas/python-graphblas/pull/441 ) rather than build version-compatibility infrastructure within the package, which seems like an appropriate decision to me, given their other comments on how refactoring their backend system is on their development roadmap.

I want to also emphasize several of the "nice to have" features in the package, including a very impressive Recorder class that can keep a record of all the GraphBLAS calls made for debugging and reproducibility, automatic compilation of user-supplied functions including lambdas using numba, and the excellent i/o module. These indicate the authors are actively invested in user experience above and beyond the tidy API they expose.

Aside from the above comments, the fundamentals of the package design are strong: modern packaging with setuptools, automated code quality checks, CI, and coverage reports. The decisions made in code structure seem principled, responsive to the constraints of the problem, and result in a very usable user interface - my compliments to the Authors.

Docs

The package is well documented from an introduction to the problem that GraphBLAS attempts to solve through package design decisions and practical examples. It is suitable for a general audience with some exposure to graph operations and programming in python, which is impressive given the highly specialized nature of the library.

Future suggestions for the authors include embedding their example notebooks in the documentation, and improving the API documentation. Currently, I assume due to some of the abstraction decisions made in the rest of the package, there is not comprehensive documentation of every operation available to the user that might otherwise be accomplished with calling autodoc over a class or a module, but some common operators are listed here and the list of available operators are present in the GraphBLAS documentation as well as through object inspection within an interactive Python session. Given the nature of wrapping code where the underlying operations are well documented elsewhere, this is less of a problem than it would be in other packages.

Altogether the docs are excellent with several clear points of improvement, but far above average in the landscape of scientific python packages.

Tests

Who among us can claim 100% test coverage? https://coveralls.io/github/python-graphblas/python-graphblas

The tests are well organized and comprehensive, and I was able to find a corresponding test for every package feature I looked at easily. In more security-sensitive contexts one would want to do more adversarial input fuzzing, but I don't think that's all that relevant here since I've never seen graph data analysis libraries used as a malware vector. I have no notes on the tests, this is good work.

szhorvat commented 1 year ago

Thanks for the patience everyone.

tl;dr This is a very nice package, technically sound, with a well-thought out Pythonic interface. My perception was that in order to bring GraphBLAS to Python users (and thus fully realize its promise of implementing graph algorithm in terms of high-level building blocks), what we need the most is Python-centric documentation and training materials.

Introduction

First of all, let me say that python-graphblas is a well thought out and technically solid package. It provides a Pythonic interface to the C-based GraphBLAS API, currently supporting SuiteSparse:GraphBLAS (which is the only usable GraphBLAS implementatin available at the moment).

The review should concern python-graphblas, but it's impossible to talk about it without also discussing GraphBLAS itself. And here I must make it clear that I am a newcomer to GraphBLAS. GraphBLAS is a sandardized C API aiming to provide high-level yet general building blocks for graph algorithms, similarly to how BLAS and LAPACK provide building blocks for linear algebra. It does this through some very elegant math, employing the language of linear algebra and abstract algebra, generalizing the usual (+, *) operations to a wider class of semirings. For example, with the usual (+, *) matrix product the kth power of a graph's adjacency matrix gives the number of walks between any two vertices. Replacing + by min and * by + will give us shortest path lengths up to k instead. What is not yet clear to me where the limits of this approach are: Are there common graph concepts that cannot be expressed in this framework? Are there some which can be expressed, but the most efficient algorithms to compute them can't be expressed with GraphBLAS?

Documentation and target audience

python-graphblas is realizing one of the major advantages of the GraphBLAS approach: it is now possible to implement graph algorithms in a convenient and high-level langauge like Python without the need for explicit loops, and thus without a significant loss of performance. Indeed, the python-graphblas authors have also created the graphblas-algorithm package, which achieves performance comparable to C/C++ libraries while being written in pure Python. (To be fair, it's not clear to me how much of the performance comes from parallelization, but the benchmarks are impressive.)

This brings me to a point about how python-graphblas is presented. This is the first paragraph in the docs:

python-graphblas is a Pythonic interface to the highly performant SuiteSparse:GraphBLAS library for performing graph analytics in the language of linear algebra.

This makes the impression that python-graphblas is just an interface to a C library, perhaps even aimed at people who already understand that library. I think this is backwards: if one of the main GraphBLAS advantages is usability from high-level languages, then the target audience should be users of such languages. It seems to me that a project like this should spend at least as much effort on good documentation and training materials as on technical bindings. And this is precisely the weak point of python-graphblas. It is impossible to learn the system without referring to external material that describes the C API. In fact the competing package pygraphblas seems to be doing a little bit better on this front, though still not well enough for users to be able to avoid C-based documentation.

My main recommendation---and I realize that this is a long-term project---is making significant improvements to the documentation with the pure-Python user in mind.

In the intro:

Explain how this package is different from graph libraries like igraph or NetworkX? Make it clear that it is not at all an alternative to them.
Who is the target audience? Algorithm implementors, not people looking for existig algorithms.

Interoperability

In addition to the technically sound and well thought out GraphBLAS interface, the package contains functionality for interoperability with graph theory / network analysis tools, as well as some visualization tools. The quality of these is in need of some improvement:

Interoperability is not trivial, and requires agreeing on a specific interpreatation of the data. Some discussion of this in the docs will be useful. See https://github.com/python-graphblas/python-graphblas/issues/398. It will be good to be explicit that GraphBLAS doesn't really operate with graphs, but with matrices, whose interpretation is up to the algorithm implementor.
Since a NetworkX converter is included, the interpretation issues should be documented. See also https://github.com/python-graphblas/python-graphblas/issues/475

I opened some visualization issues, https://github.com/python-graphblas/python-graphblas/issues/473 https://github.com/python-graphblas/python-graphblas/issues/474 https://github.com/python-graphblas/python-graphblas/issues/475

All this said, interoperability and visualization are not core functionality. As I see it, they are for convenience, and the package should not be judges based on these.

tomalrussell commented 1 year ago

Thanks for the above, @sneakers-the-rat and @szhorvat !

@eriknw @jim22k @SultanOrazbayev - recognising that you've been engaged with the review already, and that some things may be pushed to a longer timeframe, are you happy to respond and make any priority changes by the end of the week (30th June)?

eriknw commented 1 year ago

I applaud the reviewers 👏 . I never expected such thorough and honest reviews. Thank you for positive feedback and the criticisms. I agree with all of it--I think you nailed both the strengths and the weaknesses. The reviews are valuable and I suspect will help shape our vision and effort for the next couple of years.

In particular, there are two main areas we need to give more attention:

documentation
maintainability

Heh, this is probably true for many projects. My main focus in the near and medium term will be maintainability.

Now if I may ramble on for a bit...

It's interesting that reviews occur at a specific moment in time. I know the history of python-graphblas, and, trust me, it has evolved significantly in every 6 month period of its 3.5 year lifetime. If we had been writing pristine user-facing documentation from the beginning, it would have needed to be rewritten and revised endlessly. Functionality would probably be 1.5-2 years behind where we are today. If you're curious, go back and look at versions around 1.3.8 to 1.3.14. It's recognizable, but so different and missing so, so much!

Anyway. I want to highlight this comment:

if one of the main GraphBLAS advantages is usability from high-level languages, then the target audience should be users of such languages

Absolutely! I agree 100%. We aspire to this. It will take time.

From a product perspective, I/we wanted to get the syntax and functionality stable enough to begin writing graphblas-algorithms in earnest. We are targeting networkx users and are adding dispatching to networkx.

Oh, and if you're curious why our test coverage is so high, it's' for multiple reasons:

I was considering Cythonizing everything and moving it to cygraphblas, and thorough coverage would give me more confidence
- I eventually dropped this idea, b/c it would be too great a barrier for maintainability (also, coverage now supports Cython)
Similarly, we still plan to further develop a parallel, Dask-ified version of this library that will use the same tests.
For greater confidence when adding a new implementation, or to help test a new implementation of GraphBLAS
It's simply a good idea, especially for a package as complicated as ours!

Wrapping up... I think we have replied to all open issues from the review. They will keep us busy, that's for sure.

@jim22k @SultanOrazbayev want to say anything else? I don't think it's necessary for you to comment here.

Thanks again all, hope to see you around ❤️ !

tomalrussell commented 1 year ago

Thanks @eriknw (also for the lovely tone and for the background and bit of history)!

@szhorvat and @sneakers-the-rat for complete clarity, can you confirm you're happy with responses?

szhorvat commented 1 year ago

Yes. The suggestions I made are mostly for the long term.

sneakers-the-rat commented 1 year ago

Same, full approval from me :)

tomalrussell commented 1 year ago

Thanks all, time and effort very much appreciated. The review process is done!

All that's left is to wrap up, publish the version of record and acknowledge all your contributions.

🎉 python-graphblas has been approved by pyOpenSci! Thank you @eriknw for submitting python-graphblas and many thanks to @szhorvat and @sneakers-the-rat for reviewing this package! 😸

Author and Reviewer Wrap-Up Tasks

There are just a few things left to do to wrap up this submission, @eriknw, @jim22k, @SultanOrazbayev:

[x] Activate Zenodo watching the repo if you haven't already done so.
[x] Tag and create a release to create a Zenodo version and DOI.
[x] Add the badge for pyOpenSci peer-review to the README.md of python-graphblas. The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/81).
[x] Add python-graphblas to the pyOpenSci website. Please open a PR to update this file: to add your package and name to the list of contributors.

Both reviewers and maintainers (@sneakers-the-rat, @szhorvat too):

[ ] Please fill out the post-review survey. All maintainers and reviewers should fill this out.
[x] Reviewers and maintainers, if you have time and are open to being listed on our website, please add yourselves to this file via a PR so we can list you on our website as contributors!

Editor Final Checks

Please complete the final steps to wrap up this review. @tomalrussell, please do the following:

[ ] Make sure that the maintainers filled out the post-review survey
[x] Invite the maintainers to submit a blog post highlighting their package. Feel free to use / adapt language found in this comment to help guide the author.
[x] Change the status tag of the issue to 6/pyOS-approved6 🚀🚀🚀.

If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

pyOpenSci / software-submission

Python-graphblas: high-performance sparse linear algebra for scalable graph analytics #81

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Other comments (manually added)

Please fill out our survey

Editor and Review Templates

Editor in Chief checks

[ ] Initial onboarding survey was filled out We appreciate each maintainer of the package filling out this survey individually. :raised_hands: Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. :raised_hands:

Editor comments

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

Review status

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review

General Comments

Code Quality

Docs

Tests

Issues Opened:

Review

General Comments

Code Quality

Docs

Tests

Introduction

Documentation and target audience

Interoperability

Author and Reviewer Wrap-Up Tasks

Editor Final Checks