Floating point exception (core dumped) from nullspace of large sparse matrix

HereAround commented 4 years ago

I am trying to compute the right kernel of an integer valued and sparse matrix over the integers. For small matrices, this works just fine. But for large matrices I face a floating point exception (core dumped).

To reproduce this error, please find below a minimal working example for a roughly 60.000 x 50.000 sparse matrix. The computation will allocate about 65GB of RAM for the densely computed kernel matrix. The error will be triggered right after.

I am using the .C below and the .sms input file for the matrix (I had to add them as .txt files since git does not support .c and .sms files):

MinimalExample-C.txt Input-sms.txt

I have been able to track this issue to the method GaussDomain<_Field>::eliminate in line 122ff of /linbox/algorithms/gauss/gauss-elim.inl. It seems that the line

field().divin (field().neg (headcoeff, lignecourante[(size_t)j_head].second), lignepivot[0].second);

causes this issue - at this point lignepivot[0].second = 0 and the division by zero fails.

Any help/suggestions to fix this would be very much appreciated.

Related - I expect that my kernel matrix is sparse. Therefore, allocating 65GB of RAM just to realize that most entries are zero does not seem ideal to me. Is a sparse kernel computation over the integers by now available in linbox (c.f. https://github.com/linbox-team/linbox/issues/130)? And if so, does it consume less resources?

HereAround commented 4 years ago

I have found that this issue also exists for smaller matrices. Here is the sms-file for a 9472 x 9856 matrix (as .txt since github does not support .sms)

smallMatrix.txt

Computing the right nullspace of this matrix again leads to

Floating point exception (core dumped)

HereAround commented 4 years ago

And here is a yet smaller example (367 x 608)

YetSmallerMatrix.txt

HereAround commented 4 years ago

I have managed to produce yet a smaller example (142 x 142).

m.txt

Can you reproduce the error? Maybe my matrix file is wrongly formated?

jgdumas commented 4 years ago

Dear HereAround, thank you for your question. Actually the kernel of an integral matrix might be a rational matrix. So the code is working if replacing the integers by rational: I see a '-1/3' appearing for instance in the kernel of the 'm.txt' matrix.

5c5
< #include <givaro/qfield.h>
---
> #include <givaro/zring.h>
17c17
<     typedef Givaro::QField<Givaro::Rational> Field;
---
>     typedef Givaro::ZRing<Integer> Field;

HereAround commented 4 years ago

Thank you. This indeed explains this issue.

linbox-team / linbox

Floating point exception (core dumped) from nullspace of large sparse matrix #245