python-graphblas / graphblas-algorithms

Graph algorithms written in GraphBLAS
Apache License 2.0
76 stars 4 forks source link

Add a few igraph algorithms to run via `scripts/bench.py` #54

Open eriknw opened 1 year ago

eriknw commented 1 year ago

See #53 (this PR maybe closes it).

CC @szhorvat. I'm not at all familiar with using igraph, so I'm hoping you can take a look at this. What other algorithms do you think would be easy to compare against? I think we can compare some SSSP and BFS algorithms once #51 is in. Also, how is one supposed to load Matrix Market files into an igraph Graph? Going from *.mtx file to scipy.sparse to networkx to igraph seems less than ideal.

I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice. On the other hand, we can view PageRank and other centrality measures as ranking metrics (i.e., order them by score), so a fair comparison could be to run iterative versions of the algorithm until the same ranking is achieved.

szhorvat commented 1 year ago

Including @ntamas here you might have more feedback. I'll also take a look next week.

I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.

eriknw commented 1 year ago

I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.

Understood. We have your attention, so I'd like to use it 😉 . I think many people are curious about how igraph compares.

Part of the value proposition of python-graphblas is it provides high performance sparse data structures and operations on them. python-graphblas provides the foundation from which algorithms can be created; graphblas-algorithms is a collection of such algorithms. Users aren't limited to use graphblas-algorithms though--it is possible to explore and invent custom, bespoke, and hopefully fast algorithms that specifically suite ones needs. Some of our earliest (and slightly messy) examples of this are here: https://github.com/metagraph-dev/grblas-recipes/ (someday we'd like to clean up and update some of these to include in our documentation).

So, yeah, I appreciate that you aren't expecting us to benchmark against igraph as part of the review. Nevertheless, if we are able to do so even in a limited fashion, it can help support our claimed value proposition that GraphBLAS can be high-performance, and, hence, a useful tool for people.

Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX. One should be skeptical of benchmarks in general including these, and I'm sure they don't tell the full story, but the results may be interesting nevertheless:

graphblas_vs_igraph1

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: -1.25 :warning:

Comparison is base (6de1fd6) 69.16% compared to head (8e58900) 67.91%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #54 +/- ## ========================================== - Coverage 69.16% 67.91% -1.25% ========================================== Files 77 77 Lines 3152 3151 -1 Branches 602 597 -5 ========================================== - Hits 2180 2140 -40 - Misses 786 832 +46 + Partials 186 179 -7 ``` | [Impacted Files](https://codecov.io/gh/python-graphblas/graphblas-algorithms/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-graphblas) | Coverage Δ | | |---|---|---| | [graphblas\_algorithms/nxapi/\_utils.py](https://codecov.io/gh/python-graphblas/graphblas-algorithms/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-graphblas#diff-Z3JhcGhibGFzX2FsZ29yaXRobXMvbnhhcGkvX3V0aWxzLnB5) | `57.47% <0.00%> (ø)` | | ... and [7 files with indirect coverage changes](https://codecov.io/gh/python-graphblas/graphblas-algorithms/pull/54/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-graphblas) Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-graphblas). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-graphblas)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

szhorvat commented 1 year ago

I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice.

Probably PageRank is just not the best type of computation for benchmarking ... The results from using the default arguments are still practically useful though.

I just checked what PRPACK does exactly. For graph with less than 128 vertices, it uses Gaussian elimination (which is exact, no tolerance comes into play). For larger graphs, it seems to use a variant of Gauss-Seidel iteration. igraph uses a hard-coded tolerance of $10^{-10}$ here, but tolerances are not comparable between different iterative algorithms anyway.

szhorvat commented 1 year ago

Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX.

Thanks for sharing these results! I assume these were run in parallel, which is fair, since this seems to be one of the major advantages of using GraphBLAS. But it would be interesting to see a serial comparison too.

In igraph, PRPACK can use some parallelism, but it's not very effective. This can be disabled by setting the OMP_NUM_THREADS=1 environment variable. Most other parts of igraph don't run in parallel, except when relying on a parallelized external library (e.g. BLAS).

eriknw commented 1 year ago

But it would be interesting to see a serial comparison too.

Agreed! I'll try to get new results this weekend.

This can be disabled by setting the OMP_NUM_THREADS=1 environment variable

Setting this environment variable works for python-graphblas as well, or one can do gb.ss.config["nthreads"] = 1.