Closed ddddavidee closed 3 years ago
maybe can be someway related to #136
Hey, thanks for pointing out this issue.
I'll need to run some tests and dig into this. Since .row_reduce()
is not JIT compiled, as referenced in #136, it is not as fast as it could be. However, your comparison is against a pure Python, naive implementation (or so it seems), and the performance delta is quite surprising to me. I thought I had a naive implementation as well. Perhaps I overlooked something.
Were the reduced row echelon forms the same?
Also, could you provide your example matrix and field (i.e., which prime are you using) for me to test against / with? You can save your matrix with np.save()
.
The difference may come from the reference implementation converting to REF first and then RREF second. The difference may be observed because your sparse matrix has few pivots. Said differently, my implementation may be performing needless multiplies with zero. More investigation needed.
yes, I can provide a file with the matrix. I'll do later today or tomorrow morning (europe time)
Were the reduced row echelon forms the same?
yes, I tested with the following:
print(MM.all() == MM_reduced.all())
Here you have the matrix:
order = secp128r1.order
import galois
GF = galois.GF(order)
matrix_np = np.asarray(MM)
np.savez_compressed("matrix.npz", matrix = matrix_np)
The order is: 340282366762482138443322565580356624661
I agree. But there is a huge difference between your implementation (around 4 minutes) and the galois one (4 hours). I'll try to modify the galois' one to follow yours to check if there's an improvement...
On Sat, 6 Nov 2021, 18:41 Nayuki, @.***> wrote:
Matrix row reduction's time complexity is inherently O(n3). You can improve the run time by constant factors, but you can't make the procedure fundamentally faster. I think the problem is that your matrix is too big.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mhostetter/galois/issues/192#issuecomment-962485470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMVOION5ZD34T5N5Q7YPBLUKVSETANCNFSM5HK4L75Q .
@nayuki I saw a comment from you, but don't see it anymore. I love your website and definitely used it when developing this library. Would love to collaborate in some way.
@ddddavidee I think potentially the issue arises in how the FieldArray
is verified in the constructor. Since it uses dtype=object
for such large fields, the __new__()
method makes sure the objects in the array are Python ints, in the valid range, and so forth. Potentially that overhead is getting called far too often when all the row reduction operations are being performed.
I need to verify this, but that's my hunch.
@mhostetter Thanks. My comment reflected a misunderstanding of the problem @ddddavidee was facing.
My opinion now is that probably both my algorithm and your galois library algorithm are O(n3), but supposedly yours has a constant factor 100× bigger.
The issue does mainly reside where I previously suggested -- unnecessary overhead in "creating" or "setting" arrays. I'm working on this in #200. I already have a 42x speed up. Originally @nayuki's code was ~60x faster. There might be more performance to squeak out.
@nayuki, what I meant is that the same computation took a very different time (~60x times) changing from your code to the one in galois... and I was thinking that there was a problem somewhere...
@mhostetter I pulled the branch and re-tried the computation (with some other small changes, I used lru_cache from functools, and shortcut the 'division-by-one' case, and now the computation for the very same matrix takes less than two minutes (Wall time: 1min 43s
)
@ddddavidee you beat me to it! Thanks for the pre-emptive testing. 😄
Could you enumerate your additional changes, either here or in the comments of #200? If generally-applicable, I'd like to incorporate them too.
@ddddavidee I pushed another performance improvement to that branch. You'll likely see another speed up.
@ddddavidee you beat me to it! Thanks for the pre-emptive testing. smile
:-) :-D Thank you a lot for working on the issue !!!
Could you enumerate your additional changes, either here or in the comments of #200? If generally-applicable, I'd like to incorporate them too.
I simply added few @lru_cache
decorators on the add, subtract, divide and multiply functions hoping that could be accelerate the whole computation caching some operations results. (from functools import lru_cache
)
And in the divide
function I added:
if b == 1:
return a
@ddddavidee I pushed another performance improvement to that branch. You'll likely see another speed up.
yes, even faster. around 40secs
@ddddavidee sweet, glad you've seen improvement. I just merged #200 into master
. Feel free to close if you feel the issue has been resolved.
Yes, I'm closing it. Thanks a lot!
@ddddavidee you beat me to it! Thanks for the pre-emptive testing. smile
Could you enumerate your additional changes, either here or in the comments of #200? If generally-applicable, I'd like to incorporate them too.
with this code, the row_reduce()
method is 2x faster.
The code is very very inspired by the implementation by @nayuki
This is a very fast adaptation and can for sure be optimized, (var p
and numpivots
are essentially the same)
def row_reduce(A, ncols=None):
if A.ndim != 2:
raise ValueError(f"Only 2-D matrices can be converted to reduced row echelon form, not {A.ndim}-D.")
ncols = A.shape[1] if ncols is None else ncols
nrows = A.shape[0]
A_rre = A.copy()
p = 0 # The pivot
numpivots = 0
for j in range(ncols):
if numpivots >= nrows:
break
# Find a pivot in column `j` at or below row `p`
idxs = np.nonzero(A_rre[p:,j])[0]
if idxs.size == 0:
continue
i = p + idxs[0] # Row with a pivot
numpivots += 1
# Swap row `p` and `i`. The pivot is now located at row `p`.
A_rre[[p,i],:] = A_rre[[i,p],:]
# Force pivot value to be 1
A_rre[p,:] /= A_rre[p,j]
# Force zeros above and below the pivot
idxs = np.nonzero(A_rre[p:,j])[0].tolist()
A_rre[idxs,:] -= np.multiply.outer(A_rre[idxs,j], A_rre[p,:])
p += 1
if p == A_rre.shape[0]:
break
p = 0
for i in reversed(range(numpivots)):
idxs = np.nonzero(A_rre[i, p:])[0]
if idxs.size == 0:
continue
idxs = np.nonzero(A_rre[i,:p])[0].tolist()
A_rre[idxs, :] -= np.multiply.outer(A_rre[idxs, j], A_rre[p, :])
p += 1
if p == A_rre.shape[1]:
break
return A_rre
Hi,
I've tried to use this library to row reduce a (quite) big matrix over a finite field. It succeeded and took quite a huge time in comparison with a naïve implementation.
As an example I copy paste the timing from the jupyter notebook.
with: https://www.nayuki.io/page/gauss-jordan-elimination-over-any-field
with galois:
the matrix, for reference is sparse and defined over a 128-bit prime, the shape is: 513x1025