metagraph-dev / dask-grblas

Distributed GraphBLAS via dask and python-graphblas
Apache License 2.0
6 stars 6 forks source link

made a few changes in order to keep up with latest grblas version #4

Closed ParticularMiner closed 2 years ago

ParticularMiner commented 2 years ago

I have begun to make a few changes in the code in order to make the package compatible with the latest version of grblas.

Still lots more to do!

Caution: I mostly followed my nose here. So it is possible that some of these changes that I've made are not what you want at all. Feel free to correct me, whenever you get the time.

eriknw commented 2 years ago

@ParticularMiner this is amazing 🎉 ! I can't wait to give this a try.

eriknw commented 2 years ago

Thanks again for starting this work @ParticularMiner. I have time this week to push this forward, and I think it makes sense for me to work from this PR.

  1. Do you have local changes you need to push?
  2. Would you prefer if I push my changes directly to this PR? Otherwise, I plan to make a new branch.
  3. Any comments or other things I should know?

I think first I'll update dask-grblas to be more up to date with grblas. I'll probably leave the matrix multiply untouched for a day or two.

ParticularMiner commented 2 years ago

@eriknw

Of course feel free to choose whatever fits you best. You will soon see that my programming skills are not as mature as yours, so I hesitate to say you should start off from this PR. Moreover, you know this package inside out. I don't (at least not yet).

I don't have anymore local changes now. I'm glad to work from any new branch you make.

I appreciate the fact that all this is quite a lot work. Probably, this is the reason development was suspended for a while? If you feel you need some specific help that I could give, please don't hesitate to let me know, and I'll see what I can do.

I think the prospect of a dask version of grblas is not just exciting but also important.

By the way, I wrote to the admin of GraphBLAS slack channel in an effort to join but I haven't received a reply in more than 3 weeks. Could there be a problem?

ParticularMiner commented 2 years ago

@eriknw

FYI, in this PR, I mainly started out from the test scripts: test_vector.py and test_matrix.py, editing them to ensure that they all ran successfully against the latest version of grblas. Thanks to your very handy utility function compare(), I was successful in doing this for all test units in these scripts, except of course for test_attrs() in both scripts, which seems to detail what more attributes in grblas need implementing in dask-grblas.

Afterwards, I added a few new test units to test_matrix.py for element-wise operations, reduce, and matrix-multiplication.

So if you choose to use this PR, I'd recommend you begin by confirming that the tests are running successfully as I claim, and then perhaps examine the diffs to check if I miscoded anything. I tried to follow your coding logic and style as much as possible. I fear some of the test units I introduced may run for too long and might need breaking up into smaller units. I could amend that if you want.

eriknw commented 2 years ago

Thanks!

I've updated the branch here, and tests run great:

Thanks for adding new tests! Very useful (and no worries about being slow--this is why we have @pytest.mark.slow and --runslow).

Wow, the code for dask-grblas is kind of wild. I'm amazed you were able to jump in and get something working! It's taken some time for me to re-familiarize myself (heh, I don't know it "inside and out" atm), but it's coming back.

Next steps are:

We appear to work largely in different timezones, so I'll push changes as I make them, and you're welcome to do the same.

In the short term, I think an incremental approach like the above makes the most sense. dask-grblas was actually thrown together pretty quickly (as is evident by the lack of documentation--sorry!), so at some point we shouldn't be afraid to take a giant leap and rewrite lots of stuff and copy some tests over from grblas. I think much of the structure and approaches taken by dask-grblas are solid and we can continue to reuse them, but it's clear this code hasn't been updated in 17 months and that grblas evolved a lot since then. It's going to take some work to get dask-grblas up-to-date(-ish), but it may not actually be that bad to update. Before the "big leap" forward, though, I'd like to discuss and revisit design decisions and see if we can do anything better.

I first created dask-grblas as a proof of concept to see whether such an approach could work. I believe it can. I paused at matrix multiply because, yeah, it was going to take more time to implement and explore (and as you're doing, I wanted to experiment with different matrix multiplication algorithms and determine whether it would be useful to give users a choice for which to use). The main reason I paused dask-grblas and cygraphblas is that is became clear that grblas was still evolving rapidly, and it was more important to focus on grblas.

I think the prospect of a dask version of grblas is not just exciting but also important.

Agreed! Thanks for your involvement :)

By the way, I wrote to the admin of GraphBLAS slack channel in an effort to join but I haven't received a reply in more than 3 weeks. Could there be a problem?

Yeah, this is a problem. Apparently it's invite-only, which isn't good. We just proposed dropping Slack and moving to Zulip so it's more easily accessible. If you'd still like to join the slack channel, feel free to email me your email (if you're comfortable doing so) at ewelch@anaconda.com and I'll send you an invite.

eriknw commented 2 years ago

You can also find me on Dask's slack:

eriknw commented 2 years ago

Let's get this in... feel free to continue and make a new PR @ParticularMiner