Possible bug in Pardiso solver

mipals commented 7 months ago

Hi there,

In the solve! function for PardisoDirectLDLSolver a Kfake is utilized in the place of the actual KKT system.

https://github.com/oxfordcontrol/Clarabel.jl/blob/76a9a11c890e8948d297e4b826df52257b132f51/src/kktsolvers/direct-ldl/directldl_mklpardiso.jl#L93-L100

While it is true that there is already a factorization I think that this causes issues in the refinement stage, as this stage utilizes matrix-vector products in order to reduce the residual. To show why this causes issues I set Pardiso.set_msglvl!(ps, Pardiso.MESSAGE_LEVEL_ON) which result in the following for one my matrices

In the highlighted box a relative error of 1 can be seen in the first refinement iteration (ref_it), which would correspond to the relative error one would expect when Kfake is set to zero.

Cheers, Mikkel

goulart-paul commented 7 months ago

Thanks for flagging this. Entirely believable that this is broken -- it is a little harder for me to test MKL since I no longer have an intel machine locally.

We will try to make a fix and perhaps issue a 0.7.2 patch if we can get it to work.

mipals commented 7 months ago

If it is an issue for you with Pardiso I can also try to make a PR with a fix.

goulart-paul commented 7 months ago

Having looked into this a bit, I think it's not entirely clear how to proceed with a fix that would not require supporting changes elsewhere in the code.

Part of the problem is that Pardiso wants to perform iterative refinement as part of its solution procedure, and this clashes somewhat with the iterative refinement we already do ourselves elsewhere. In retrospect it was maybe not the right decision for us to have implemented IR external to our linear solvers like that. This could probably be fixed by moving our IR code to be an internal feature of the QDLDL solver, which it was primarily aimed at anyway.

For the moment, and thinking here of changes that would not require modification to our general LDL solver interface, two things could be done relatively easily:

1) The wrapper around the pardiso interface (i.e. PardisoDirectLDLSolver) could just take its own internal copy of the $K$ matrix at object creation, and then provide implementations of update_values! and scale_values! to maintain that copy between iterations. This could then be used in place of the Kfake trick currently employed. The downside of this is that the $K$ matrix is needlessly duplicated and can be very large.

2) You could try to force Pardiso to not do iterative refinement, because I think (?) that that is the only reason that the Kfake trick fails. It's not entirely clear to me how to do that though given the Pardiso settings described here. As a start, you could try iparm[7] = 0 (no iterative refinement steps), but it looks from the documentation that it might do 2 IR steps anyway if it perturbs pivots. So then maybe something like iparm[9] = <huge value>, since that seemingly sets an extremely small required pivot value. We already apply a diagonal shift to $K$ that should prevent extremely small pivots from arising. NB: arrays above an in the MKL documentation appear to be zero-indexed, so it might be iparm[8] and iparm[10] in Julia.

A more reliable fix, with no further memory requirement, would be to modify the solve! function in the LDL interface to pass $K$ as an input. I'm not opposed to that either, but it would require changes to all of the other solvers as well as the Rust implementation if we want to keep everything in sync.

I will also have a think about where and how we want IR to happen in general, since many solvers provide this already. It would be nice for us to be as flexible as possible on this point.

mipals commented 7 months ago

Part of the problem is that Pardiso wants to perform iterative refinement as part of its solution procedure, and this clashes somewhat with the iterative refinement we already do ourselves elsewhere. In retrospect it was maybe not the right decision for us to have implemented IR external to our linear solvers like that. This could probably be fixed by moving our IR code to be an internal feature of the QDLDL solver, which it was primarily aimed at anyway.

Ahh, I did not realize that you have your own iterative refinement inside of the solver. Yes, maybe that should've been part of the QDLDL solver itself.

The wrapper around the Pardiso interface (i.e. PardisoDirectLDLSolver) could just take its own internal copy of the matrix at object creation, and then provide implementations of update_values! and scale_values! to maintain that copy between iterations. This could then be used in place of the Kfake trick currently employed. The downside of this is that the matrix is needlessly duplicated and can be very large.

Are you sure about the duplication of memory? A struct would usually just refer to memory and not copy it (see MWE below)

using SparseArrays
using LinearAlgebra

struct matrix_struct{T}
    Kfake::SparseMatrixCSC{T}
end

Kfake = tril(sprand(100,100,0.1))   # Create Kfake
MS = matrix_struct(Kfake)           # Create struct
MS.Kfake[1,1]   # Not 1 
Kfake[1,1] = 1  # Set to 1
MS.Kfake[1,1]   # Now 1

If it is the case that it just refer to the KKT memory, is there still a need for update_values! and scale_values!? So far my "fix" has been to just set Kfake = KKT in the creation of PardisoDirectLDLSolver. This at least make the solver more reliable than when using a Kfake that is full of zeros. But maybe this result in an outdated KKT later on?

You could try to force Pardiso to not do iterative refinement, because I think (?) that that is the only reason that the Kfake trick fails. It's not entirely clear to me how to do that though given the Pardiso settings described here. As a start, you could try iparm[7] = 0 (no iterative refinement steps), but it looks from the documentation that it might do 2 IR steps anyway if it perturbs pivots. So then maybe something like iparm[9] = <huge value>, since that seemingly sets an extremely small required pivot value. We already apply a diagonal shift to that should prevent extremely small pivots from arising. NB: arrays above an in the MKL documentation appear to be zero-indexed, so it might be iparm[8] and iparm[10] in Julia.

Yes, I think it only fails due to refinement. While I am actually using PanuaPardiso and not MKL they should have a fairly similar interface. That is a pretty cool idea. I tried setting different values of v in Pardiso.set_iparm!(ps, 10, v) (Panua uses 1-indexing so this equivalent to iparm[9] in MKL) but there seemed to always be refinements. This was by no means a thorough investigation, so take it with a grain of salt.

A more reliable fix, with no further memory requirement, would be to modify the solve! function in the LDL interface to pass as an input. I'm not opposed to that either, but it would require changes to all of the other solvers as well as the Rust implementation if we want to keep everything in sync.

I will also have a think about where and how we want IR to happen in general, since many solvers provide this already. It would be nice for us to be as flexible as possible on this point.

I did not realize that the Rust and Julia implementations were still in sync. Thats pretty cool, but I guess that makes changes a little more complicated. I will let you think more about IR - I do not know what the best approach here would be.

goulart-paul commented 7 months ago

Are you sure about the duplication of memory? A struct would usually just refer to memory and not copy it (see MWE below)

You are correct that in Julia you could just assign $K$ as you suggest and the matrix would be shared between the LDL solver object and the one that contains it. In that case there would be no duplication of memory required. The issue is more on the Rust side, where sharing data like that between objects requires more care in order to respect borrowing and lifetime rules. Copying would be a lot easier there, though probably still not absolutely necessary.

If it is the case that it just refer to the KKT memory, is there still a need for update_values! and scale_values!? So far my "fix" has been to just set Kfake = KKT in the creation of PardisoDirectLDLSolver. This at least make the solver more reliable than when using a Kfake that is full of zeros. But maybe this result in an outdated KKT later on?

I think if you took this path the KKT matrix would remain up-to-date. We maintain KKT for the iterative refinement step that we do ourselves, and the update_values! and scale_values! functions exist so that the underlying solver can make compatible updates to itself if needed. This is used, for example, in QDLDL since it internally maintains its own permuted copy of the KKT matrix.

I did not realize that the Rust and Julia implementations were still in sync. Thats pretty cool, but I guess that makes changes a little more complicated. I will let you think more about IR - I do not know what the best approach here would be.

Yes, they are still in sync except for a small number of implementation details. We are trying to keep them in sync as we make further developments, though some things (e.g. the variety of linear solvers available in each) might differ over time.

mipals commented 7 months ago

as you suggest and the matrix would be shared between the LDL solver object and the one that contains it. In that case there would be no duplication of memory required. The issue is more on the Rust side, where sharing data like that between objects requires more care in order to respect borrowing and lifetime rules. Copying would be a lot easier there, though probably still not absolutely necessary.

Ahh that makes sense.

I think if you took this path the KKT matrix would remain up-to-date. We maintain KKT for the iterative refinement step that we do ourselves, and the update_values! and scale_values! functions exist so that the underlying solver can make compatible updates to itself if needed. This is used, for example, in QDLDL since it internally maintains its own permuted copy of the KKT matrix.

Perfect, then I will use Kfake = KKT in my fork for now.

I realized that you have a setting where it is possible to disable the IR on the Clarabel side. So I guess I can simply let Pardiso handle the IR by setting settings.iterative_refinement_enable = false and setting iparm[7] equal to settings.iterative_refinement_max_iter?

Another question: What is the reason you that you provide your own ordering instead of letting Pardiso handle that internally? (I am still fairly new to this, so I was just wondering if there is some good reason that I am missing).

goulart-paul commented 7 months ago

I realized that you have a setting where it is possible to disable the IR on the Clarabel side. So I guess I can simply let Pardiso handle the IR by setting settings.iterative_refinement_enable = false and setting iparm[7] equal to settings.iterative_refinement_max_iter?

This could work, but with a caveat: there is another pair of settings static_regularization_enable and static_regularization_shift that can also be configured. If the latter is some $\epsilon$, then the matrix we factor is $K + \epsilon S$, where $S$ is a diagonal matrix whose entries are all either $1$ (in the upper left) or $-1$ (in the lower right). The shift ensure that we are factoring a matrix that is always both full rank and quasidefinite, which is the reason we can use the LDL factorisation with diagonal $D$ instead of the more general Bunch-Kaufman form. The outer iterative refinement then serves in part to compensate for this shift.

If you disable the shift then your $K$ matrix might require $2 \times 2$ blocks in $D$, which is generally bad since the corresponding $L$ no longer has such a predictable sparsity pattern. Keeping the shift and setting the IR limit internally in MKL would only ensure you had a more accurate solution to $(K + \epsilon S)x = b$, rather than a solution to $Kx = b$. The worse case would be if your $K$ were rank deficient, e.g. if $P= 0$ and $A$ has redundant columns.

Another question: What is the reason you that you provide your own ordering instead of letting Pardiso handle that internally? (I am still fairly new to this, so I was just wondering if there is some good reason that I am missing).

We don't force the ordering, i.e. we don't preorder the $K$ matrix before passing to the LDL solvers. The QDLDL solver option uses AMD internally to get a good ordering, but other solvers can use something else. The fact that we don't order $K$ before passing to the solvers is what causes complications, since when making updates to $K$ we are maintaining it in its original form, whereas the underlying LDL solver may or may not have some internally permuted copy that also requires updating.

Edit to add: having written the above, my earlier statement about it maybe having been a mistake to implement our own IR scheme seems less clear. By doing that ourselves we can try to compensate for the static regularization term we are adding. Some LDL solvers allow for the dynamic shifting of pivots (as do we in QDLDL), but I don't know how common it is that one can specify the correct sign of those shifts. I think not very.

mipals commented 7 months ago

Hmm, would this be fixed by providing Pardiso with someting like KKT - shift*S (in order to just retrieve $K$) or is it a little more complicated than that to compensate for the static regularization term?

goulart-paul commented 7 months ago

Hmm, would this be fixed by providing Pardiso with someting like KKT - shift*S (in order to just retrieve K) or is it a little more complicated than that to compensate for the static regularization term?

If you only want $K$ then you can just set static_regularization_enable = false and no shift will be applied. The issue is that the unshifted $K$ is not guaranteed to be full rank. Even when it is full rank, it is not guaranteed to be factorizable as $K = LDL^T$ with diagonal $D$.

QDLDL can't handle that case. Pardiso can (i.e. it allows 2x2 blocks in $D$), so maybe can cope anyway. Either way, I think there is no benefit to applying a shift and then immediately trying to back it out before passing to the solver.

mipals commented 7 months ago

Perfect. I will run with static_regularization_enable = false for now then.

Thanks a lot for the many insights. They have definitely saved me from many hours of debugging!

mipals commented 6 months ago

An update:

I figured out that in it is possible to completely disable the internal iterative refinement in Panua-Pardiso by setting iparm(8)=-99. Doing so and having Clarabel take care of the IR seems to fix most of the issues that I've had.

Unfortunately the documentation for MKLPardiso does not state that the same is possible.

goulart-paul commented 5 months ago

Support for HSL MA57 and Pardiso (both :panua and :mkl options) have been released as part of 0.8.0. The interface to the linear solvers has changed a bit to provide the LHS matrix to the solve! step as discussed above.

Happy to take suggestions for other 3rd party solvers if you have found some that work well.

mipals commented 5 months ago

Sounds good! I guess technically with the disabling of IR that the LHS matrix is not used anymore - at least in the case of Panua.

I've only tried Panua and HSL so far, and they're working great for my problems. I will let you know if I in the future experiment with additional solvers.

I will most likely try to add Panua (and possibly HSL) to the rust implementation soon (In the end I need to call Clarabel from C++). Any pointers?

goulart-paul commented 5 months ago

I think this would be a nice addition and is also something on our internal goals list. I think actually calling the solvers should (?) be relatively straightforward. It seems harder to handle the linking though, particularly if linking against a dynamic library that needs to be searched for.

Something like that happens in https://github.com/rust-math/intel-mkl-src (particularly intel-mkl-tool), but I didn't look into it closely.

The reason for wanting to support dynamic linking is that ideally we would be able to distribute binaries in python that use 3rd party solvers when locally available.

Worth checking also what the licensing implications of that would be.

mipals commented 5 months ago

Perfect. I will take a look at the repo you linked!

I read in the recent preprint describing Clarabel that the rust implementation also supports supernodal LDL using faer-rs. However, I could not find anything relating to faer-rs in the rust code. Is that section just hinting at what is to in a not-so-distant future release? :)

goulart-paul commented 5 months ago

The faer interface is implemented at present in a separate branch and was merged together with various other commits for the benchmarking that appears in the paper. I have just released version 0.8, which mostly contains changes for chordal decomposition / SDPs, and can rebase the branch with the faer interface to that if you want to try it.

We haven't merged it yet with main since I have been waiting on a few bells and whistles in the faer-rs crate that would make it a good general purpose multithreaded option for us. Those have appeared quite recently but are (I think) not yet deployed.

I can ping you here when we have something user-friendly if you like.

mipals commented 5 months ago

I would definitely appreciate a ping when you have something that is user-friendly!

In a slightly related note. I am having failures using HSL on 0.8.1 which I do not have on my own branch. I have simply used qsd=true, but I can see that you further modify Ma57_Control. I will make a separate issue with a few details, when i've investigated a little further.

goulart-paul commented 5 months ago

I was trying for a collection of settings that would force HSL to produce an LDL^T factorisation with a diagonal D since in an effort to get it to match the behaviour of the internal QDLDL option. If you are setting the static regularization parameters to 0.0 in the settings (I think mentioned previously ...?) then you might get a failure that way from a bad pivot.

Whether I should do that is another matter. i. It might be better to just allow HSL its default options, even if that means it starts reordering things at high accuracy.

mipals commented 5 months ago

The issue is definitely a bad pivot since I get error code -5 after a few iterations (manual p. 12). This is with static regularization with standard settings enabled.

From HSL.jl I can see that they explicitly enables pivoting when setting sqd=true, which is why it did not fail previously. https://github.com/JuliaSmoothOptimizers/HSL.jl/blob/a62033274e2ce40295c1c18ca41550325b878266/src/hsl_ma57.jl#L73-L76

oxfordcontrol / Clarabel.jl

Possible bug in Pardiso solver #161