Open eriknw opened 1 year ago
Including @ntamas here you might have more feedback. I'll also take a look next week.
I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.
I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.
Understood. We have your attention, so I'd like to use it 😉 . I think many people are curious about how igraph compares.
Part of the value proposition of python-graphblas
is it provides high performance sparse data structures and operations on them. python-graphblas
provides the foundation from which algorithms can be created; graphblas-algorithms
is a collection of such algorithms. Users aren't limited to use graphblas-algorithms
though--it is possible to explore and invent custom, bespoke, and hopefully fast algorithms that specifically suite ones needs. Some of our earliest (and slightly messy) examples of this are here: https://github.com/metagraph-dev/grblas-recipes/ (someday we'd like to clean up and update some of these to include in our documentation).
So, yeah, I appreciate that you aren't expecting us to benchmark against igraph
as part of the review. Nevertheless, if we are able to do so even in a limited fashion, it can help support our claimed value proposition that GraphBLAS can be high-performance, and, hence, a useful tool for people.
Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX. One should be skeptical of benchmarks in general including these, and I'm sure they don't tell the full story, but the results may be interesting nevertheless:
Patch coverage has no change and project coverage change: -1.25
:warning:
Comparison is base (
6de1fd6
) 69.16% compared to head (8e58900
) 67.91%.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice.
Probably PageRank is just not the best type of computation for benchmarking ... The results from using the default arguments are still practically useful though.
I just checked what PRPACK does exactly. For graph with less than 128 vertices, it uses Gaussian elimination (which is exact, no tolerance comes into play). For larger graphs, it seems to use a variant of Gauss-Seidel iteration. igraph uses a hard-coded tolerance of $10^{-10}$ here, but tolerances are not comparable between different iterative algorithms anyway.
Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX.
Thanks for sharing these results! I assume these were run in parallel, which is fair, since this seems to be one of the major advantages of using GraphBLAS. But it would be interesting to see a serial comparison too.
In igraph, PRPACK can use some parallelism, but it's not very effective. This can be disabled by setting the OMP_NUM_THREADS=1
environment variable. Most other parts of igraph don't run in parallel, except when relying on a parallelized external library (e.g. BLAS).
But it would be interesting to see a serial comparison too.
Agreed! I'll try to get new results this weekend.
This can be disabled by setting the OMP_NUM_THREADS=1 environment variable
Setting this environment variable works for python-graphblas
as well, or one can do gb.ss.config["nthreads"] = 1
.
See #53 (this PR maybe closes it).
CC @szhorvat. I'm not at all familiar with using igraph, so I'm hoping you can take a look at this. What other algorithms do you think would be easy to compare against? I think we can compare some SSSP and BFS algorithms once #51 is in. Also, how is one supposed to load Matrix Market files into an igraph Graph? Going from
*.mtx
file to scipy.sparse to networkx to igraph seems less than ideal.I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice. On the other hand, we can view PageRank and other centrality measures as ranking metrics (i.e., order them by score), so a fair comparison could be to run iterative versions of the algorithm until the same ranking is achieved.