pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
93 stars 36 forks source link

Python-graphblas: high-performance sparse linear algebra for scalable graph analytics #81

Closed eriknw closed 1 year ago

eriknw commented 1 year ago

Submitting Author: Erik Welch (@eriknw) All current maintainers: (@eriknw, @jim22k, @SultanOrazbayev) Package Name: Python-graphblas One-Line Description of Package: Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics Repository Link: https://github.com/python-graphblas/python-graphblas Version submitted: 2023.1.0 Editor: @tomalrussell Reviewer 1: @sneakers-the-rat Reviewer 2: @szhorvat Archive: DOI
JOSS DOI: N/A Version accepted: 2023.7.0 Date accepted (month/day/year): 07/14/2023


Description

Python-graphblas is like a faster, more capable scipy.sparse that can implement NetworkX. It is a Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics. Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic.

Scope

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Audience: anybody who works with sparse data or graphs. We are also implementing a backend to NetworkX (which supports dispatching in version 3.0) written in Python-graphblas called graphblas-algorithms, so we are quite literally targeting NetworkX users!

Python-graphblas provides a faster, easier, more flexible, and more scalable way to operate on sparse data, including for graph algorithms. There are too many scientific applications to list ranging from neuroscience, genomics, biology, etc. It may be useful wherever scipy.sparse or NetworkX are used. Although GraphBLAS was designed to build graph algorithms, it is flexible enough to be used in other applications. Anecdotally, most of our current users that I know about are from research groups in universities and laboratories.

We are also targeting applications that need very large distributed graphs. We have experimented with Dask-ifying python-graphblas here, and we get regular interest from people who want e.g. distributed PageRank or connected components.

pygraphblas, which hasn't been updated in more than 16 months. There are many differences in syntax, functionality, philosophy, architecture, and (I would argue) robustness and maturity. python-graphblas syntax targets the math syntax, whereas pygraphblas is much closer to C. python-graphblas handles dtypes much more robustly, has efficient conversions to/from numpy and other formats, is architected to handle additional GraphBLAS implementations (more are on the way!), has exceptional error messages, has many more tests and functionality, supports Windows, and much, much more. We have also been growing our team, because sustainability is very important to us.

Although we have/had irreconcilable differences (which is why we decided to create python-graphblas), the authors have always been cordial. We all believe strongly in the ethos of open source, and I would describe our relationship as having "radical generosity". For example, we have an outstanding agreement that each library is welcome to "borrow" from the other (with credit). We may "borrow" some of their documentation :)

We also worked together to create and maintain the C binding to SuiteSparse:GraphBLAS: https://github.com/GraphBLAS/python-suitesparse-graphblas/ We could use help automatically generating wheels for this library on major platforms via cibuildwheel.

Limited prior discussion in this issue: https://github.com/pyOpenSci/python-package-guide/issues/21#issuecomment-1368046000

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

Other comments (manually added)

Given a product mindset, we believe that Python-graphblas is a great product, but I think our go-to-market strategy has been lacking. We have been very engineering-heavy, and even our goal of targeting NetworkX users is engineering-heavy via creating graphblas-algorithms. I hope this peer-review process can help us prioritize our efforts (such as a plan to improve documentation) as well as a place to write a blog post or two.

Please fill out our survey

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

tomalrussell commented 1 year ago

@eriknw and team, I'd also like to invite you to write a blog post on your package for us to promote your work. If you are interested - here are a few examples of other blog posts:

This can be a really high-level motivation for the package, for a slightly-scientific-Python-user-audience, or could draw on your introductory tutorial material to get straight to what the package does..

This is totally optional and not a requirement, but if you have time, we'd love to spread the word about python-graphblas to pyOpenSci blog readers :relaxed:

sneakers-the-rat commented 1 year ago

did the post-review survey and submitted contributors.yml patch :)

lwasser commented 1 year ago

Friends - i Believe this issue can be closed!! if it should be opened please just reopen or let me know! congratulations on a successful review and thank you everyone for participating in our pyOpenSci review process! I am so appreciative of you all!! ✨

eriknw commented 1 year ago

w00t! Also, we'd be happy to write a short blog post :)