FlorianPommerening commented 1 year ago

Hi all,

we run SoPlex on relatively small LPs where we solve the same constraints with different bounds several times per second. I realize that such small LPs are a bit of a niche use case but one thing we noticed in a profile was that updating the bounds took a surprisingly long time (~14% of the total time). Most of time was spent in _rationalLUSolver.clear(). This sounds related to SYNCMODE which we had set to SYNCMODE_ONLYREAL (as well as SOLVEMODE and CHECKMODE).

We also saw a large overhead in storing the solution (SoPlexBase::_storeSolutionReal, 8% of total time) and the solve itself took much longer than with our baseline of SoPlex 3 (we are in the process of upgrading from v3 to v6, so the baseline is v3 and the version where we experience the slowdown is the latest version on the main branch here).

Can either of these overheads be avoided in some way? I'm attaching the profile in case you are interested: callgrind.out.303286.log

ambros-gleixner commented 1 year ago

Hi Florian, thanks for the issue and happy to help analyzing this further with you. You are right that your use case is something we do not have performance tests for, but that does not mean it is not interesting for us. Generally, I can imagine that overhead was introduced (unintentionally) while restructuring SoPlex in order to support rational solving. But I hope we can fix it.

Let's start with the easiest: In real solving mode, _rationalLUSolver.Clear() should not need to be called. Can you tell me the path (or paths) from where this is called currently?

Your parameter set sounds already correct, but can you still post an exact list of what you change from default?

FlorianPommerening commented 1 year ago

Hi Ambros, thanks for looking into this.

I only changed two unrelated parameters:

    soplex.setIntParam(SoPlex::VERBOSITY, SoPlex::VERBOSITY_ERROR);
    soplex.setIntParam(SoPlex::SIMPLIFIER, SoPlex::SIMPLIFIER_OFF);

The path to _rationalLUSolver.Clear() is

void SoPlexBase<R>::changeLhsReal(int i, const R& lhs); source
- void SoPlexBase<R>::_changeLhsReal(int i, const R& lhs); source
- _rationalLUSolver.clear(); source

In the profile, the second call is inlined. Here is the relevant part of the profile:

Looking for _rationalLUSolver.clear(); in soplex.hpp, it looks like it is often called unconditionally (without checking for solving mode).

FlorianPommerening commented 1 year ago

Maybe the part of the profile below the clear method also helps:

The time seems to be spend mostly with memory management (free, alloc) but I'm not 100% certain the profiler reports this correctly on that level of detail.

ambros-gleixner commented 1 year ago

Sorry that I took a while to get back to this. Instead of adding many if's to soplex.hpp I decided to safeguard SLUFactorRational::clear() itself. Can you try #20 and see if the overhead disappears?

ambros-gleixner commented 1 year ago

PS: Will be away from keyboard for two weeks now, but happy to continue afterwards.

FlorianPommerening commented 1 year ago

I'm also on holidays right now but will be happy to look into this further next month.

FlorianPommerening commented 12 months ago

Hi Ambros, I'm back now and started looking into this again. I couldn't use #20 directly because our surrounding code also needs 522b8a18 which seems to be merged in master but not in the bugfix-60 branch. I started from master just manually applied the changes from the patch in #20 there. A profile confirmed that this indeed takes care of all of the overhead caused by clear().

For the remaining overhead, it is less clear to me what causes it. I'm attaching screenshots of two profiles, one created with 3.1.1 and one with the fix from #20. They are run on the same instance which behaves deterministically. In both cases, the method to solve is called 8640 times but the legacy code used 6.1B samples, where the new code used 10.8B. Can you see something from the traces that they do differently? It is somewhat harder to compare them because v3 is so old and the structure of the code changed since then.

3.1.1 6.0.x (master + fix from #20)

ambros-gleixner commented 12 months ago

Some observations:

a bit over 10% of the samples seem to go into the new safety check that checks the feasibility of solutions in the original problem without simplification and scaling; this can also trigger a new simplex run if there are violations
the time spent in fresh factorizations seems quite identical
but about 2 times more pivots seem to happen One question: You write above that simplifier is turned off, but maybe scaling makes a difference? Are you using scaling with the old code? You could experiment with soplex.setIntParam(SoPlex::SCALER, SoPlex::SCALER_OFF);

Otherwise, I think we would need to start looking into log files as well to confirm that there are more pivots and where they could come from.

FlorianPommerening commented 12 months ago

The old code talked to SoPlex via OSI, where the new code directly uses the C++ interface. The use through OSI is a bit of a blackbox to me, so I'm not sure if any settings are made internally but I didn't switch scaling off outside of OSI.

I tried again without scaling and the results indeed improved. Some blocks that are in the previous profile are not shown here anymore because they fell below the display threshold. I'm not sure which calls are for the scaler, so I don't know if the reduction is indeed because of the change, or caused by something else. (The number of samples seems to go down in many places, not just one.) Screenshot from 2023-09-10 12-32-00

Where do you see the doubling of pivots? Is that the call to leave? But this is called 91835 times in the old code and 419635 times in the new code, so about a factor of 4.

I could generate logs and statistics if that helps.

FlorianPommerening commented 12 months ago

I enabled the log output with both versions and uploaded the results here. In this setting, there is an LP with a bunch of "permanent" constraints that is solved over and over (roughly 10000 times over all). From one solve to the next, the bounds on the permanent constraints are changed, and some "temporary" constraints are added. All temporary constraints are removed again after the solve and different ones are added for the next solve. To line up the logs better, I added some output before and after each solve (starting solve, finished solve). Everything outside of these blocks can be ignored.

osi-v311.log v60x.log

I'm not sure how to interpret the logs but one thing that stuck out to me was that in the v3.1.1 version, the column facts increased with every solve, while it reset back to 1 with v6.0.x.

For my own reference later: the instance used for the profiles above and the logs was sokoban-opt08-strips/p04.pddl, the configuration was astar(operatorcounting([state_equation_constraints(), lmcut_constraints()], lpsolver=soplex))

ambros-gleixner commented 11 months ago

More important than the "facts" column is the "iters" column. It looks like for most LPs, the old version did not have to pivot at all and took zero iterations; in the new version you see many more iterations for each LP.

Apparently the old version warmstarts every LP solve from the previous basis, while the new version loses the basis and solves from scratch.

SoPlex would generally discards the basis if constraints with status basic are removed, since the dimension changes. So I am not sure how the old version manages to warmstart. Are these cases where no constraints are added or removed?

In any case what you would need to find out is where the basis gets invalidated, maybe you can trace what is happening in spxchangebasis.cpp (setStatus(NO_PROBLEM)).

FlorianPommerening commented 11 months ago

The way I understood things, SoPlex doesn't directly implement primal/dual simplex but the the algorithm depends on whether the representation is in column or row form (https://soplex.zib.de/doc/html/FAQ.php#primaldualsimplex). Could this have anything to do with it? Maybe in switching from OSI to directly talking to SoPlex, we accidentally changed the representation?

Would the entering algorithm on the column representation do the same as the leaving algorithm on the row representation?

ambros-gleixner commented 11 months ago

Maybe in switching from OSI to directly talking to SoPlex, we accidentally changed the representation?

Yes, this could be the case. Probably OSI had column rep fixed. You can test this with soplex.setIntParam(SoPlex::REPRESENTATION, SoPlex::REPRESENTATION_COLUMN) vs. soplex.setIntParam(SoPlex::REPRESENTATION, SoPlex::REPRESENTATION_ROW). The new default (not present in SoPlex 3) is an auto strategy deciding dynamically between both depending on the ratio of rows vs. columns.

Would the entering algorithm on the column representation do the same as the leaving algorithm on the row representation?

Yes, that is correct.

FlorianPommerening commented 11 months ago

I tried running the code with the 4 combinations of algorithm and representation. While the logs are different, all combinations had a larger number of iterations compared to the 3.1.1 logs from above:

column_dual.log column_primal.log row_dual.log row_primal.log

I then tried writing out the models from the first and second solve. The old 3.1.1. code had no iterations on the second solve which I guess means that it warm-starts and directly finishes. The 6.0.x code used ~70 iterations. (I had to add .txt to the extension for Github to accept the files.)

initial_60x.lp.txt second_60x.lp.txt initial_311.lp.txt second_311.lp.txt

The models look the same to me but it is somewhat hard to say because:

The line breaks are different, tripping up diff tools.
Coefficients of 1 are implicit in the old version and explicit in the new.

The order of entries in a constraint is different. For example

# 3.1.1
cons0:  x66 + x67 - x59 - x70 >= -1
# 6.0.x
C0 : -1.0000000000000000e+00 x59 + 1.0000000000000000e+00 x66 + 1.0000000000000000e+00 x67 - 1.0000000000000000e+00 x70 >= -1.0000000000000000e+00

Going from the first to the second LP, the changes are only in the constraint rhs:

cons2:  x70 + x71 - x67 - x72 >= -1    --> 0
cons3:  x67 + x72 >= 0                 --> -1
cons4:  x72 + x73 - x71 - x75 >= 0     --> -1
cons5:  x71 + x75 >= -1                --> 0
cons156:  x67 + x72 - x70 - x71 >= 0   --> -1
cons157:  x71 + x75 - x72 - x73 >= -1  --> 0

The way the code is set up though, all rhs are set again (all rhs other then the ones above just get set to their old values).

ambros-gleixner commented 11 months ago

Warm start or even hot start (keeping the factorization) should be perfectly possible. Can you investigate, whether the method restoreInitialBasis is called between solves, and what it's backtrace is?

FlorianPommerening commented 11 months ago

I did find a call to it on the second solve (SPxSolverBase::solve -> SPxSolverBase::init -> SPxBasisBase::load -> SPxBasisBase::load -> SPxBasisBase::restoreInitialBasis) and traced why the model was uninitialized.

It turns out I overlooked something and there are actually additional changes to the model that I didn't see before. Some constraints get removed and others added. The set of constraints removed and added happen to be identical between these two solves which is why I didn't spot them when looking at the diff of the models. I would normally expect some of these constraints to change and will look into why they are the same here, but in the end this affects both the new and old version in the same way. Knowing that constraints change, I can understand that there is no warm start; but now I'm surprised that there is a warm start with the old version.

ambros-gleixner commented 11 months ago

I am just guessing here: It could be that the Osi interface that you used before stores the basis and loads it back into SoPlex before the next solve. This should work if the dimensions have not changed and the basis matrix is still regular. Could that be the explanation? You could mimic this rather easily by getBasis/setBasis.

FlorianPommerening commented 11 months ago

I didn't actively get or set the basis in the version using OSI but maybe this is something that OSI does internally? The profile I posted earlier shows some time in a function SpxBasis::load, so it sounds like a good guess.

To try it in the new code, I added this code around the optimization call:

    int basis_num_cols = basis_cols.size();
    int basis_num_rows = basis_rows.size();
    if (get_num_variables() == basis_num_cols && 
        get_num_constraints() == basis_num_rows) {
        soplex.setBasis(basis_rows.data(), basis_cols.data());
    } else {
        basis_cols.resize(get_num_variables());
        basis_rows.resize(get_num_constraints());
    }

    soplex.optimize();

    soplex.getBasis(basis_rows.data(), basis_cols.data());

This is supposed to always store the basis after a solve and then reload it before a solve if the dimensions of the problem are still the same. Does this look OK to you?

With this, the instance I tested sped up a lot and is now slightly faster than the OSI version. From the log it looks like the new code now also warm-starts where possible (0 iterations where the old code had 0 iterations) but it still does more iterations than the old code (higher number of iterations in cases with non-zero iterations). It seems to do those iterations faster, though, as the overall time is lower. I'll run this on a larger benchmark set to confirm but it looks like this fixes the issue for me.

I saw #20 is already merged into the branch bugfix-60 and #15 was already merged into master but didn't make the 6.0.4 release. Any chance there will be a release that includes those two fixes?

ambros-gleixner commented 11 months ago

Wow, nice. There will be a major release earl 2024 from the master branch. bugfix will soon be merged into master, then master has both changes.

FlorianPommerening commented 10 months ago

Great to hear about the release.

The experiments I ran unfortunately showed very mixed results. Sometimes the time for storing the basis just does not pay off as it did in the instance we looked at here and over all we still do not reach the performance we had with OSI and SoPlex 3. But since this moved far away from the initial issue, we can just leave it at that. It doesn't seem to be caused by a single change.

Also, some results suggest that I introduced a bug by restoring the basis. It looks like some LP's now report a different optimal value but I didn't trace this down yet. Is it really always possible to re-use the basis as long as the dimensions of the problem stay the same, no matter what else changes?

ambros-gleixner commented 10 months ago

I am happy to keep investigating. If you take the best setting you have currently and show profiling results, we can make new conjectures as to what may cause slower performance.

Also, some results suggest that I introduced a bug by restoring the basis. It looks like some LP's now report a different optimal value but I didn't trace this down yet. Is it really always possible to re-use the basis as long as the dimensions of the problem stay the same, no matter what else changes?

No, if the constraint coefficients change then there are cases where the basis becomes singular. In that case, SoPlex should either terminate with singular as status (in which case no solution is available) or restart from scratch as if no basis was given. Not sure what happens without a log. So if you receive a different optimal value this would indeed be a bug.

leoneifler commented 6 months ago

What is the status here?

scipopt / soplex

High runtime of `_rationalLUSolver.clear()` in `SoPlexBase<R>::_changeLhsReal()` #19

3.1.1

6.0.x (master + fix from #20)