Fix num_factor and num_flops info after factorization

According to the documentation, num_factor and num_flops should hold the values "without pivoting after analyse phase, with pivoting after factorize phase". The latter does not seem to work correctly:

num_factor and num_flops are initialized with the values from akeep, i.e. with the values from the analyse phase: https://github.com/ralna/spral/blob/353f104a892552450c882530c5ec424914488fb3/src/ssids/ssids.f90#L827 The actually counted values during factorization are therefore added on top. That causes the result to be up to twice as large (exactly twice in the posdef case) of what it should be. As a fix, num_factor and num_flops are now initialized to 0.
The CPU factorization does not seem to count these statistics at all. Together with the issue mentioned before, this causes num_factor and num_flops to be reported correctly after the factorize phase in the posdef case. In the indef case, any extra nnz or flops due to delays would not be reported though. This commit adds counting of these stats to the CPU factorization code.

Counting for the CPU code is taken directly from how it is done in the GPU code: https://github.com/ralna/spral/blob/353f104a892552450c882530c5ec424914488fb3/src/ssids/gpu/factor.f90#L1441-L1444 I thought about simplifying the loop a bit (e.g. I am not quite sure why the iteration is done backwards) or even implementing closed form solutions for these sums. I decided to mirror the exact implementation from the GPU code (which is also the same as in HSL_MA97) though, as no change seemed relevant enough to me (clang even likes to emit closed form solutions on its own).

I tested a few matrices to make sure that the values after analysis and factorization are the same in the posdef case. For the indef case I set nemin=8 and pivot_method=3 and compared to HSL_MA97 with default options, both using Metis 4 ordering. On some matrices I see identical values that way. On others the results are pretty close though not identical, but that is probably an expected differece in pivoting between the two solvers.

All these tests are on CPU only. That also means that I could not actually verify whether the issue of almost double the values being reported happens as described on GPU, without these changes. It is my best understanding of the code though.

ralna / spral

Fix num_factor and num_flops info after factorization #109