Computing the kernel of a (small) matrix with large rational entries is slow

ThibautVerron commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues for a bug report that matches the one I want to file, without success.

Did you read the documentation and troubleshoot guide?

[X] I have read the documentation and troubleshoot guide

Environment

- **OS**: Kubuntu 22.04
- **Sage Version**: 9.7, 10.0beta1

Steps To Reproduce

Take a small singular matrix with large rational coefficients, and compute its (left) kernel. Compare with the time taken to compute the echelon form, and the kernel from the echelon form.

Do the same with the right kernel.

For example:

import time
N = 50
print(f"{N=}")
M = Matrix(QQ,6,6,lambda i,j : QQ.random_element(10^N))
M[5] = add(QQ.random_element(10^N)*M[i] for i in range(5)) 

print("Left kernel")

t = time.perf_counter()
E = M.transpose().echelon_form().transpose() # fast
print("Echelon form", time.perf_counter() - t)

t = time.perf_counter()
_ = M.left_kernel() # slow (2 minutes)
print("Kernel basis", time.perf_counter() - t)

t = time.perf_counter()
_ = E.left_kernel() # fast
print("Kernel basis from echelon", time.perf_counter() - t)

print("Right kernel")
M = M.transpose() # ensure that the kernel doesn't become bigger

t = time.perf_counter()
E = M.echelon_form() # fast
print("Echelon form", time.perf_counter() - t)

t = time.perf_counter()
_ = E.right_kernel() # slow (2 minutes)
print("Kernel basis from echelon", time.perf_counter() - t)

t = time.perf_counter()
_ = E.right_kernel() # fast
print("Kernel basis from echelon", time.perf_counter() - t)

Expected Behavior

Computing the kernel should not be slower than computing the echelon form and the kernel of the echelon form.

Actual Behavior

The computation of the echelon form, and the computation of the kernel of the echelon form, are fast. But the computation of the kernel of the original matrix is very slow.

The same results are observed with the right kernel.

Indicative timings on my system (with the 10.0beta1 version):

Left kernel
Echelon form 0.002189219929277897
Kernel basis 105.95192904700525
Kernel basis from echelon 0.3859920169925317
Right kernel
Echelon form 0.002033156924881041
Kernel basis 99.98940658790525
Kernel basis from echelon 0.4324517990462482

Additional Information

No response

vneiger commented 1 year ago

Adding to this issue: on my side (10.0beta4),

sage: %time K2 = M.left_kernel()
CPU times: user 22.2 s, sys: 47.9 ms, total: 22.2 s

(which should correspond to the "too long" version above), whereas

sage: %time K = M.left_kernel(algorithm="generic")
CPU times: user 693 µs, sys: 2 µs, total: 695 µs

tscrim commented 1 year ago

So the default is doing a rational kernel of the corresponding integer matrix (by clearing denominators) using iml. This is what takes the time. If we instead use flint, it is very quick (~7ms for _rational_kernel_flint() versus ~42s for _rational_kernel_iml()). So this suggests a change in the default algorithm is needed. I am not sure about other behaviors.

vneiger commented 1 year ago

Indeed. Then performance becomes much better than computing the left kernel of the reduced column echelon form (a few 1ms versus a few 100ms). Well... for a fair comparison with the same left_kernel in both cases:

N = 50
M = Matrix(QQ,6,6,lambda i,j : QQ.random_element(10^N))
M[5] = add(QQ.random_element(10^N)*M[i] for i in range(5)) 

t = time.perf_counter()
for _ in range(100):
    Mz = M._clear_denom()[0]
    KM = Mz._rational_kernel_flint()
t = time.perf_counter() - t
print(f"kernel:  {t/100}")

t = time.perf_counter()
for _ in range(100):
    Ez = M.transpose().echelon_form().transpose()._clear_denom()[0]
    KE = Ez._rational_kernel_flint()
t = time.perf_counter() - t
print(f"cef+ker: {t/100}")

On my laptop I have

kernel:  0.00208310158690438
cef+ker: 0.000390590687748045

This means column echelon form followed by kernel would still be non-negligibly faster. Yet, this big difference is in good part an artefact of the fairly specific properties of the built matrix (the last row has larger entries, and is a linear combination of the other rows; we look at left kernel) which translate into some properties of the column echelon form.

Running the above code with a matrix having entries of similar size, but with the last column being the sum of the other column:

M = Matrix(QQ,6,6,lambda i,j : QQ.random_element(10^N))
M[5,:] = Matrix(QQ, 1, 6, lambda i, j : QQ.random_element(10^(2*N)))
M[:,5] = add(M[:,i] for i in range(5))

I get timings about 2ms (unchanged) for direct kernel, and 1ms for echelon form then kernel. The performance difference is already less critical. Making now the first column the sum of the others:

M = Matrix(QQ,6,6,lambda i,j : QQ.random_element(10^N))
M[5,:] = Matrix(QQ, 1, 6, lambda i, j : QQ.random_element(10^(2*N)))
M[:,0] = add(M[:,i] for i in range(1,6))

leads to almost 2ms for both (with a slight advantage for echelon form then kernel, still).

vneiger commented 1 year ago

Furthermore, for an arguably more frequent case of left kernel computation, computing the column echelon form first is in fact a bad idea:

M = Matrix(QQ,6,3,lambda i,j : QQ.random_element(10^N))

t = time.perf_counter()
for _ in range(1000):
    Mz = M._clear_denom()[0]
    KM = Mz._rational_kernel_flint()
t = time.perf_counter() - t
print(f"kernel:  {t/1000:.6f}")

t = time.perf_counter()
for _ in range(1000):
    Ez = M.transpose().echelon_form().transpose()._clear_denom()[0]
    KE = Ez._rational_kernel_flint()
t = time.perf_counter() - t
print(f"cef+ker: {t/1000:.6f}")

On my laptop: 74µs versus 286µs. And it seems that currently, calling left_kernel directly on M takes several seconds...

vneiger commented 1 year ago

So in short, this issue has led to (at least) the following points:

are there good reasons to keep IML the default choice in some cases? (i.e. situations where it would be better than Flint)
the documentation for the kernel methods (the general ones: kernel, left, right) is misleading, since currently Flint is never called for computing the kernel of rational matrices (even when one sets the option algorithm='flint')
should the "column echelon first then kernel" approach be offered to the user, or even done by default in some cases? As seen above, in some cases this will be significantly faster than computing the left kernel directly, but: is there an easily identified and not-too-rarely-encountered family of matrices where this approach is faster?

sagemath / sage