Closed ParticularMiner closed 2 years ago
@ParticularMiner this is amazing 🎉 ! I can't wait to give this a try.
Thanks again for starting this work @ParticularMiner. I have time this week to push this forward, and I think it makes sense for me to work from this PR.
I think first I'll update dask-grblas
to be more up to date with grblas
. I'll probably leave the matrix multiply untouched for a day or two.
@eriknw
Of course feel free to choose whatever fits you best. You will soon see that my programming skills are not as mature as yours, so I hesitate to say you should start off from this PR. Moreover, you know this package inside out. I don't (at least not yet).
I don't have anymore local changes now. I'm glad to work from any new branch you make.
I appreciate the fact that all this is quite a lot work. Probably, this is the reason development was suspended for a while? If you feel you need some specific help that I could give, please don't hesitate to let me know, and I'll see what I can do.
I think the prospect of a dask
version of grblas
is not just exciting but also important.
By the way, I wrote to the admin of GraphBLAS slack channel in an effort to join but I haven't received a reply in more than 3 weeks. Could there be a problem?
@eriknw
FYI, in this PR, I mainly started out from the test scripts: test_vector.py and test_matrix.py, editing them to ensure that they all ran successfully against the latest version of grblas
. Thanks to your very handy utility function compare()
, I was successful in doing this for all test units in these scripts, except of course for test_attrs()
in both scripts, which seems to detail what more attributes in grblas
need implementing in dask-grblas
.
Afterwards, I added a few new test units to test_matrix.py for element-wise operations, reduce, and matrix-multiplication.
So if you choose to use this PR, I'd recommend you begin by confirming that the tests are running successfully as I claim, and then perhaps examine the diffs to check if I miscoded anything. I tried to follow your coding logic and style as much as possible. I fear some of the test units I introduced may run for too long and might need breaking up into smaller units. I could amend that if you want.
Thanks!
I've updated the branch here, and tests run great:
Thanks for adding new tests! Very useful (and no worries about being slow--this is why we have @pytest.mark.slow
and --runslow
).
Wow, the code for dask-grblas
is kind of wild. I'm amazed you were able to jump in and get something working! It's taken some time for me to re-familiarize myself (heh, I don't know it "inside and out" atm), but it's coming back.
Next steps are:
_matmult
and _matmul2
methods on a local machine
_matmul2
(i.e., have it in the call to blockwise
)asOneColMatrix
and asOneRowMatrix
(I suspect it's unnecessary to create a matrix)grblas.ss.concat
.
grblas
.We appear to work largely in different timezones, so I'll push changes as I make them, and you're welcome to do the same.
In the short term, I think an incremental approach like the above makes the most sense. dask-grblas
was actually thrown together pretty quickly (as is evident by the lack of documentation--sorry!), so at some point we shouldn't be afraid to take a giant leap and rewrite lots of stuff and copy some tests over from grblas
. I think much of the structure and approaches taken by dask-grblas
are solid and we can continue to reuse them, but it's clear this code hasn't been updated in 17 months and that grblas
evolved a lot since then. It's going to take some work to get dask-grblas
up-to-date(-ish), but it may not actually be that bad to update. Before the "big leap" forward, though, I'd like to discuss and revisit design decisions and see if we can do anything better.
I first created dask-grblas
as a proof of concept to see whether such an approach could work. I believe it can. I paused at matrix multiply because, yeah, it was going to take more time to implement and explore (and as you're doing, I wanted to experiment with different matrix multiplication algorithms and determine whether it would be useful to give users a choice for which to use). The main reason I paused dask-grblas
and cygraphblas
is that is became clear that grblas
was still evolving rapidly, and it was more important to focus on grblas
.
I think the prospect of a dask version of grblas is not just exciting but also important.
Agreed! Thanks for your involvement :)
By the way, I wrote to the admin of GraphBLAS slack channel in an effort to join but I haven't received a reply in more than 3 weeks. Could there be a problem?
Yeah, this is a problem. Apparently it's invite-only, which isn't good. We just proposed dropping Slack and moving to Zulip so it's more easily accessible. If you'd still like to join the slack channel, feel free to email me your email (if you're comfortable doing so) at ewelch@anaconda.com and I'll send you an invite.
You can also find me on Dask's slack:
Let's get this in... feel free to continue and make a new PR @ParticularMiner
I have begun to make a few changes in the code in order to make the package compatible with the latest version of grblas.
Still lots more to do!
Caution: I mostly followed my nose here. So it is possible that some of these changes that I've made are not what you want at all. Feel free to correct me, whenever you get the time.