DSBSparse: Addressing some performance issues

vetschn commented 4 weeks ago

Several things are still done naively in the DSBSparse matrices.

[x] Caching mechanism for the DSBCOO block-wise access. Since we already have the rowptr map in the DSBCSR, there are no changes needed there. Also Mentioned in #4.
[x] Return blocks in some sort of sparse format. Closes #4.
[x] #43
[ ] ~~Better tests for arithmetic~~

codecov-commenter commented 4 weeks ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.95312% with 59 lines in your changes missing coverage. Please review.

Project coverage is 84.79%. Comparing base (f431742) to head (869b3ee).

Files with missing lines	Patch %	Lines
src/qttools/__init__.py	36.66%	19 Missing :warning:
src/qttools/datastructures/dsbcoo.py	80.64%	12 Missing :warning:
src/qttools/datastructures/dsbcsr.py	82.08%	12 Missing :warning:
src/qttools/datastructures/dsbsparse.py	84.61%	10 Missing :warning:
src/qttools/utils/mpi_utils.py	55.55%	4 Missing :warning:
src/qttools/utils/stack_utils.py	0.00%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## dev #45 +/- ## ========================================== + Coverage 83.65% 84.79% +1.13% ========================================== Files 29 29 Lines 1083 1118 +35 ========================================== + Hits 906 948 +42 + Misses 177 170 -7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

vetschn commented 4 weeks ago

I don't know what format to return the sparse blocks as. I see a couple of options:

We return an ndarray of COO/CSR matrices. This is kind of what is done in QuaTrEx currently. This unnecessarily duplicates the rows/cols and we would need to split up the data buffer.
My preferred option would be to just return a tuple (cols, rowptr, data)/(rows, cols, data). Simple, however leaves the one calling the API to deal with doing something sensible with these three arrays.
Return a new datastructure. Should be some sort of distributed stack of COO/CSR matrices (DSCOO/DSCSR?). Probably the cleanest solution but the most work.

Edit: Opted for the second option now.

vetschn commented 3 weeks ago

So i added these vectorized getters and setters now in the DSBCOO class (DSBCSR is next). The behavior is like this:

If we are in the "stack" distribution state, you'll get an array of the expected shape (padded with zeros where requested elements are not in the matrix). So
```
inds = np.arange(10)
values = dsbcoo[inds, inds]
```
will get values.shape[-1] == 10 with maybe some zeros in there, where (ind[i], ind[i]) is not in the matrix. You can also set items with the same logic:
```
inds = np.arange(10)
values = np.ones(10)
dsbcoo[inds, inds] = values
```
This will only change those elements that are actually in the matrix, the sparsity pattern is immutable as always.
If we are in the "nnz" distribution, the same logic should apply as before this change. If you request an element that is not in the matrix, you'll get an IndexError (because we cannot know which rank is supposed to hold an element that is not in the matrix).

If you request elements that are all in the matrix but distributed accross ranks, each rank now only returns the requested elements that it holds. On the ranks where there are no elements on the requested inds, you'll get an empty array (of the correct stack shape).
```
inds = np.arange(10)
values = dsbcoo[inds, inds]
```
could be values.shape[-1] == 7 if comm.rank == 0, values.shape[-1] == 3 if comm.rank == 1, and values.shape[-1] == 0 if comm.rank == 2.

Setting elements in this distribution state works analogously:
```
inds = np.arange(10)
values = np.ones(10)
dsbcoo[inds, inds] = values
```
Each rank only sets those elements of the input, which it owns. If we are accessing elements that are not in the matrix nothing happens and the sparsity remains unchanged.

Note that everything here also works when accessing only a part of the stack, so random things like

inds = np.arange(10)
values = dsbcoo.stack[:3][inds, inds]
dsbcoo.stack[[2, 4, 6]][inds, inds] = 1.5 * values

should work as well in both distribution states.

vincent-maillou / qttools

DSBSparse: Addressing some performance issues #45

Codecov Report