How to improve CP-SAT performance for large bin-packing problems?

lalithsuresh commented 4 years ago

I'm trying to improve the runtime of a load balancing model. The goal is to load balance tasks across nodes, subject to capacity constraints along two resource dimensions. The objective function aims to minimize the maximum load on any node.

I'm struggling to get this to work well in scenarios with a large number of nodes but few tasks (there's a large number of feasible solutions I guess?). I've teased out the core logic of my model below. For now, I'd like to avoid having to tighten the domain of the load and objective functions -- because in my actual use case, that is hard to predict (given a slew of other constraints that some tasks can have, that I've excluded below).

Here is what a typical run looks like. I'd appreciate any advise on:

As the trace shows, the search starts with all tasks being assigned to the same node. Is there a way to drive the search towards spreading out tasks early in the search process?
Are there other parameters to the solver I could configure? Disabling model probing for example does save quite some time.
Any way to reduce the number of intermediate variables (the numTasks * numNodes booleanVars for example)?
What can I improve about the model to make the presolve phase faster (it dominates when there are smaller number of nodes) -- in particular, the "linear: simplified rhs" seems to be triggered a lot?

Thanks!

Parameters: log_search_progress: true num_search_workers: 4 cp_model_probing_level: 0
Optimization model '':
#Variables: 53051 (1 in objective)
 - 50000 in [0,1]
 - 50 in [0,999]
 - 3001 in [0,10000000]
#kIntMax: 1
#kLinear1: 102000 (#enforced: 100000)
#kLinear2: 49
#kLinearN: 3000
*** starting model presolve at 0.03s
- 0 affine relations were detected.
- 0 variable equivalence relations were detected.
- rule 'deductions: 100000 stored' was applied 1 time.
- rule 'int_max: reduced domains' was applied 1 time.
- rule 'linear: always true' was applied 2000 times.
- rule 'linear: reduced variable domains' was applied 1000 times.
- rule 'linear: simplified rhs' was applied 50049 times.
- rule 'linear: singleton column' was applied 2000 times.
- rule 'linear: size one' was applied 2000 times.
- rule 'presolve: iteration' was applied 1 time.
Optimization model '':
#Variables: 51051 (1 in objective)
 - 1001 in [0][2,261][263]
 - 50000 in [0,1]
 - 50 in [0,999]
#kIntMax: 1
#kLinear1: 100000 (#enforced: 100000)
#kLinear2: 49
#kLinearN: 1000
*** starting Search at 0.93s with 4 workers and strategies: [ auto, lp_br, pseudo_cost, no_lp, helper, rnd_lns_auto, var_lns_auto, cst_lns_auto, rins/rens_lns_auto ]
#Bound   1.28s best:inf   next:[0,263]    no_lp
#1       1.37s best:263   next:[0,262]    no_lp num_bool:99850
#Bound   1.37s best:263   next:[0,261]    no_lp
#2       1.42s best:257   next:[0,256]    no_lp num_bool:99850
#3       1.44s best:252   next:[0,251]    no_lp num_bool:99850
#4       1.46s best:249   next:[0,248]    no_lp num_bool:99850
#5       1.48s best:244   next:[0,243]    no_lp num_bool:99850
#6       1.50s best:240   next:[0,239]    no_lp num_bool:99850
#7       1.52s best:234   next:[0,233]    no_lp num_bool:99850
#8       1.54s best:232   next:[0,231]    no_lp num_bool:99850
#9       1.56s best:226   next:[0,225]    no_lp num_bool:99850
#10      1.58s best:221   next:[0,220]    no_lp num_bool:99850
#11      1.60s best:214   next:[0,213]    no_lp num_bool:99850
#12      1.62s best:209   next:[0,208]    no_lp num_bool:99850
#13      1.64s best:201   next:[0,200]    no_lp num_bool:99850
#14      1.66s best:196   next:[0,195]    no_lp num_bool:99850
#15      1.68s best:190   next:[0,189]    no_lp num_bool:99850
#16      1.70s best:186   next:[0,185]    no_lp num_bool:99850
#17      1.72s best:180   next:[0,179]    no_lp num_bool:99850
#18      1.74s best:174   next:[0,173]    no_lp num_bool:99850
#19      1.76s best:166   next:[0,165]    no_lp num_bool:99850
#20      1.78s best:160   next:[0,159]    no_lp num_bool:99850
#21      1.80s best:158   next:[0,157]    no_lp num_bool:99850
#22      1.82s best:153   next:[0,152]    no_lp num_bool:99850
#23      1.83s best:147   next:[0,146]    no_lp num_bool:99850
#24      1.85s best:144   next:[0,143]    no_lp num_bool:99850
#25      1.87s best:136   next:[0,135]    no_lp num_bool:99850
#26      1.89s best:133   next:[0,132]    no_lp num_bool:99850
#27      1.93s best:130   next:[0,129]    no_lp num_bool:99850
#28      1.97s best:127   next:[0,126]    no_lp num_bool:99850
#29      1.99s best:125   next:[0,124]    no_lp num_bool:99850
#30      2.02s best:119   next:[0,118]    no_lp num_bool:99850
#31      2.05s best:117   next:[0,116]    no_lp num_bool:99850
#32      2.07s best:116   next:[0,115]    no_lp num_bool:99850
#33      2.10s best:112   next:[0,111]    no_lp num_bool:99850
#34      2.14s best:110   next:[0,109]    no_lp num_bool:99850
#35      2.18s best:105   next:[0,104]    no_lp num_bool:99850
#36      2.21s best:103   next:[0,102]    no_lp num_bool:99850
#37      2.26s best:99    next:[0,98]     no_lp num_bool:99850
#38      2.28s best:97    next:[0,96]     no_lp num_bool:99850
#39      2.31s best:94    next:[0,93]     no_lp num_bool:99850
#40      2.34s best:91    next:[0,90]     no_lp num_bool:99850
#41      2.38s best:89    next:[0,88]     no_lp num_bool:99850
#42      2.44s best:88    next:[0,87]     no_lp num_bool:99850
#43      2.48s best:86    next:[0,85]     no_lp num_bool:99850
#44      2.53s best:85    next:[0,84]     no_lp num_bool:99850
#45      2.58s best:83    next:[0,82]     no_lp num_bool:99850
#46      2.63s best:79    next:[0,78]     no_lp num_bool:99850
#47      2.67s best:77    next:[0,76]     no_lp num_bool:99850
#Bound   2.69s best:77    next:[2,76]     lp_br
#48      2.71s best:76    next:[2,75]     no_lp num_bool:99850
#49      2.76s best:75    next:[2,74]     no_lp num_bool:99850
#50      2.80s best:74    next:[2,73]     no_lp num_bool:99850
#51      2.85s best:73    next:[2,72]     no_lp num_bool:99850
#52      2.89s best:71    next:[2,70]     no_lp num_bool:99850
#53      2.94s best:68    next:[2,67]     no_lp num_bool:99850
#54      2.98s best:67    next:[2,66]     no_lp num_bool:99850
#55      3.02s best:66    next:[2,65]     no_lp num_bool:99850
#56      3.06s best:65    next:[2,64]     no_lp num_bool:99850
#57      3.11s best:64    next:[2,63]     no_lp num_bool:99850
#58      3.15s best:62    next:[2,61]     no_lp num_bool:99850
#59      3.19s best:59    next:[2,58]     no_lp num_bool:99850
#60      3.23s best:56    next:[2,55]     no_lp num_bool:99850
#61      3.28s best:55    next:[2,54]     no_lp num_bool:99850
#62      3.31s best:54    next:[2,53]     no_lp num_bool:99850
#63      3.36s best:52    next:[2,51]     no_lp num_bool:99850
#64      3.39s best:50    next:[2,49]     no_lp num_bool:99850
#65      3.43s best:49    next:[2,48]     no_lp num_bool:99850
#66      3.47s best:47    next:[2,46]     no_lp num_bool:99850
#67      3.50s best:44    next:[2,43]     no_lp num_bool:99850
#68      3.54s best:43    next:[2,42]     no_lp num_bool:99850
#69      3.58s best:42    next:[2,41]     no_lp num_bool:99850
#70      3.62s best:40    next:[2,39]     no_lp num_bool:99850
#71      3.65s best:39    next:[2,38]     no_lp num_bool:99850
#72      3.69s best:37    next:[2,36]     no_lp num_bool:99850
#Bound   3.69s best:37    next:[3,36]     lp_br
#73      3.74s best:36    next:[3,35]     no_lp num_bool:99850
#74      3.78s best:35    next:[3,34]     no_lp num_bool:99850
#75      3.81s best:34    next:[3,33]     no_lp num_bool:99850
#76      3.84s best:31    next:[3,30]     no_lp num_bool:99850
#77      3.88s best:30    next:[3,29]     no_lp num_bool:99850
#78      3.92s best:29    next:[3,28]     no_lp num_bool:99850
#79      3.96s best:28    next:[3,27]     no_lp num_bool:99850
#80      3.99s best:27    next:[3,26]     no_lp num_bool:99850
#81      4.03s best:26    next:[3,25]     no_lp num_bool:99850
#82      4.07s best:25    next:[3,24]     no_lp num_bool:99850
#83      4.11s best:24    next:[3,23]     no_lp num_bool:99850
#84      4.15s best:23    next:[3,22]     no_lp num_bool:99850
#85      4.19s best:22    next:[3,21]     no_lp num_bool:99850
#86      4.23s best:21    next:[3,20]     no_lp num_bool:99850
#87      4.26s best:19    next:[3,18]     no_lp num_bool:99850
#88      4.30s best:18    next:[3,17]     no_lp num_bool:99850
#89      4.34s best:17    next:[3,16]     no_lp num_bool:99850
#90      4.38s best:16    next:[3,15]     no_lp num_bool:99850
#91      4.42s best:15    next:[3,14]     no_lp num_bool:99850
#92      4.47s best:14    next:[3,13]     no_lp num_bool:99850
#93      4.51s best:13    next:[3,12]     no_lp num_bool:99850
#94      4.54s best:12    next:[3,11]     no_lp num_bool:99850
#95      4.58s best:11    next:[3,10]     no_lp num_bool:99850
#96      4.63s best:10    next:[3,9]      no_lp num_bool:99850
#97      4.67s best:9     next:[3,8]      no_lp num_bool:99850
#98      4.71s best:8     next:[3,7]      no_lp num_bool:99850
#Bound   4.86s best:8     next:[4,7]      lp_br
#Done    4.87s  no_lp
CpSolverResponse:
status: OPTIMAL
objective: 8
best_bound: 8
booleans: 99850
conflicts: 3
branches: 37
propagations: 100536
integer_propagations: 55382
walltime: 5.35976
usertime: 5.35976
deterministic_time: 1.96926
primal_integral: 0
8
Done: 6233

Process finished with exit code 0

@Test
    public void testSlack() {
        final long now = System.currentTimeMillis();
        // Create the model.
        final CpModel model = new CpModel();

        // Create the variables.
        final int numTasks = 50;
        final int numNodes = 1000;
        final IntVar[] taskToNodeAssignment = new IntVar[numTasks];
        final int[] taskDemands1 = new int[numTasks];
        final int[] taskDemands2 = new int[numTasks];
        final int[] scores = new int[numTasks];

        final int[] nodeCapacities1 = new int[numNodes];
        final int[] nodeCapacities2 = new int[numNodes];

        for (int i = 0; i < numTasks; i++) {
            taskToNodeAssignment[i] = model.newIntVar(0, numNodes - 1, "");
        }
        for (int i = 0; i < numTasks; i++) {
            taskDemands1[i] = ThreadLocalRandom.current().nextInt(1, 5);
            taskDemands2[i] = ThreadLocalRandom.current().nextInt(1, 5);
            scores[i] = taskDemands1[i] + taskDemands2[i];
        }
        for (int i = 0; i < numNodes; i++) {
            nodeCapacities1[i] = 500;
            nodeCapacities2[i] = 600;
        }

        // 1. Symmetry breaking
        for (int i = 0; i < numTasks - 1; i++) {
            model.addLessOrEqual(taskToNodeAssignment[i], taskToNodeAssignment[i + 1]);
        }

        // 2. Capacity constraint
        final IntVar[] scoreVars = new IntVar[numNodes];
        for (int node = 0; node < numNodes; node++) {
            final IntVar[] tasksOnNode = new IntVar[numTasks];
            for (int i = 0; i < numTasks; i++) {
                final IntVar bVar = model.newBoolVar("");
                model.addEquality(taskToNodeAssignment[i], node).onlyEnforceIf(bVar);
                model.addDifferent(taskToNodeAssignment[i], node).onlyEnforceIf(bVar.not());
                tasksOnNode[i] = bVar;
            }
            final IntVar load1 = model.newIntVar(0, 10000000, "");
            final IntVar load2 = model.newIntVar(0, 10000000, "");
            final IntVar score = model.newIntVar(0, 10000000, "");
            model.addEquality(load1, LinearExpr.scalProd(tasksOnNode, taskDemands1));
            model.addEquality(load2, LinearExpr.scalProd(tasksOnNode, taskDemands2));
            model.addEquality(score, LinearExpr.scalProd(tasksOnNode, scores));

            scoreVars[node] = score;

            model.addLessOrEqual(load1, nodeCapacities1[node]);
            model.addLessOrEqual(load2, nodeCapacities2[node]);
        }

        final IntVar max1 = model.newIntVar(0, 10000000, "");
        model.addMaxEquality(max1, scoreVars);
        model.minimize(max1);
        System.out.println("Model creation: " + (System.currentTimeMillis() - now));

        // Create a solver and solve the model.
        final CpSolver solver = new CpSolver();
        solver.getParameters().setNumSearchWorkers(4);
        solver.getParameters().setLogSearchProgress(true);
        solver.getParameters().setCpModelProbingLevel(0);

        final CpSolverStatus status = solver.solve(model);
        if (status == CpSolverStatus.FEASIBLE || status == CpSolverStatus.OPTIMAL) {
            System.out.println(solver.value(max1));
        }
        System.out.println("Done: " + (System.currentTimeMillis() - now));
    }

sschnug commented 4 years ago

I wonder if a classic MIP would not be a much better choice as this problem looks like it would have good tight relaxations and there are some specialized cutting-planes available. The linear-scan like progress seems to hurt too and a MIP solver would be more driven by the objective.

That being said. I'm not sure if the issue-tracker is about modelling-help (google group?). This is not necessarily a solver-issue.

lperron commented 4 years ago

The CP-SAT solver has a linear relaxation and nearly all the linear cuts. It is in fact a linear integer solver (no continuous variables).

In our tests, unless we miss a cut, we are faster than Scip on these type of models.

Le sam. 21 déc. 2019 à 19:40, sschnug notifications@github.com a écrit :

I wonder if a classic MIP would not be a much better choice as this problem looks like it would have good tight relaxations and there are some specialized cutting-planes available. The linear-scan like progress seems to hurt too and a MIP solver would be more driven by the objective.

That being said. I'm not sure if the issue-tracker is about modelling-help (google group?). This is not necessarily a solver-issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/or-tools/issues/1799?email_source=notifications&email_token=ACUPL3IEXDU3LICWSRK7J33QZZPLNA5CNFSM4J6EWRCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHPBKBA#issuecomment-568202500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUPL3LJ5VEGZEQENV26MZTQZZPLNANCNFSM4J6EWRCA .

lperron commented 4 years ago

This being said, solving a large bin packing problem with a naive model in less than 5 seconds seems good to me.

DominikRoB commented 4 years ago

"As the trace shows, the search starts with all tasks being assigned to the same node." How do you see this? I've looked for this kind of info for a while, but could not find it.

lalithsuresh commented 4 years ago

@DominikRoB No magic here -- I'm looking at the current value of the objective for every line from the search progress ("#1 1.37s best:263 next:[0,262] no_lp num_bool:99850").

If you want to see what values are being assigned, you can attach a callback to be invoked on every solution, which can then print what you need.

ghost commented 4 years ago

One way you could get rid of some of the boolean variables is to use one cumulative constraint for each resource.

Convert taskToNodeAssignment into an IntervalVariable array.
- start[i] is an IntVar from 0 to numNodes - 1
- end[i] is an IntVar from 1 to numNodes
- size[i] is a constant equal to 1
Add a cumulative constraint with taskToNodeAssignment, taskDemands1 and nodeCapacities1.
Add a cumulative constraint with taskToNodeAssignment, taskDemands2 and nodeCapacities3.

When you create your Interval variables, there is a literal indicating the presence of the interval. See NewOptionalIntervalVar method. You should be able to compute the objective function with those variables.

Your problem seems similar to the two-dimensional vector packing problem. There may be valid cuts that may improve your model.

lalithsuresh commented 4 years ago

Thanks @AxelDelsol63. It's not clear to me from your description how the objective function for each node, which is a variable, can be extracted using the presence literals. That literal looks like it will be false if the entire interval is absent. We'd need a literal per node. What am I missing?

ghost commented 4 years ago

Sorry I got the objective function wrong. I think I got this right this time. Each node has a score "scoreVars[node]". This variable is the sum of the score of all tasks assigned to this node according to this constraint "model.addEquality(score, LinearExpr.scalProd(tasksOnNode, scores));".

You want to minimize the maximum of scoreVars[node] for all the nodes. You can use again a cumulative constraint where the resource is the score and the capacity is the variable you called "max1".

Below is a C++ code using cumulative constraints.

#include "ortools/sat/cp_model.h"

#include <random>

using namespace operations_research::sat;

int main() {

  //[START data]
  const int numTasks = 50;
  const int numNodes = 1000;

  std::vector<int> taskDemands1(numTasks);
  std::vector<int> taskDemands2(numTasks);
  std::vector<int> scores(numTasks);
  std::vector<int> nodeCapacities1(numNodes, 500);
  std::vector<int> nodeCapacities2(numNodes, 600);

  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<> dis(1, 5);
  for (int i = 0; i < numTasks; i++) {
    taskDemands1[i] = dis(gen);
    taskDemands2[i] = dis(gen);
    scores[i] = taskDemands1[i] + taskDemands2[i];
  }
  //[END data]

  CpModelBuilder cp_model;

  std::vector<IntVar> taskToNodeAssignment(numTasks);
  std::vector<IntVar> endTasks(numTasks);
  std::vector<IntervalVar> tasks(numTasks);
  IntVar max1 = cp_model.NewIntVar({0, 10000000});

  for (int i = 0; i < numTasks; i++) {
    taskToNodeAssignment[i] = cp_model.NewIntVar({0, numNodes - 1});
    endTasks[i] = cp_model.NewIntVar({1, numNodes});
    tasks[i] = cp_model.NewIntervalVar(taskToNodeAssignment[i],
                                       cp_model.NewConstant(1), endTasks[i]);
  }

  // 1. Symmetry breaking
  for (int i = 0; i < numTasks - 1; i++) {
    cp_model.AddLessOrEqual(taskToNodeAssignment[i],
                            taskToNodeAssignment[i + 1]);
  }

  // 2. Capacity constraint

  // Note : if you have different capacities for each node, you can add extra
  // interval variables.
  //  Exemple : if node 5 has a capacity of 200, add an interval var
  // starting at time 5, duration 1, demand = maxCapacity1 - 200
  int maxCapacity1 =
      *std::max_element(begin(nodeCapacities1), end(nodeCapacities1));
  CumulativeConstraint cumul1 =
      cp_model.AddCumulative(cp_model.NewConstant(maxCapacity1));
  for (int i = 0; i < numTasks; i++) {
    cumul1.AddDemand(tasks[i], cp_model.NewConstant(taskDemands1[i]));
  }

  // Do it again for the second resource
  int maxCapacity2 =
      *std::max_element(begin(nodeCapacities2), end(nodeCapacities2));
  CumulativeConstraint cumul2 =
      cp_model.AddCumulative(cp_model.NewConstant(maxCapacity2));
  for (int i = 0; i < numTasks; i++) {
    cumul2.AddDemand(tasks[i], cp_model.NewConstant(taskDemands2[i]));
  }

  // Now for the score
  CumulativeConstraint cumulScore = cp_model.AddCumulative(max1);
  for (int i = 0; i < numTasks; i++) {
    cumulScore.AddDemand(tasks[i], cp_model.NewConstant(scores[i]));
  }

  // Minimize max1
  cp_model.Minimize(max1);

  // Solving part.
  Model model;

  // Sets a time limit of 10 seconds.
  SatParameters parameters;
  parameters.set_log_search_progress(true);
  //  parameters.set_max_time_in_seconds(10.0);
  model.Add(NewSatParameters(parameters));

  const CpSolverResponse response = SolveCpModel(cp_model.Build(), &model);

  if (response.status() == CpSolverStatus::OPTIMAL ||
      response.status() == CpSolverStatus::FEASIBLE) {

    for (int i = 0; i < numTasks; i++) {
      std::cout << "Task " << i << " is assigned to node "
                << SolutionIntegerValue(response, taskToNodeAssignment[i])
                << std::endl;
    }
    std::cout << "Max1 = " << SolutionIntegerValue(response, max1) << std::endl;
    // std::cout << CpSolverResponseStats(response);
  }

  return 0;
}

lalithsuresh commented 4 years ago

@AxelDelsol63 Thanks a lot for your suggestion. It works like a charm, and it's unbelievable how fast this is. The runtime went from 5-6s down to 5ms!

lalithsuresh commented 4 years ago

I'll close this one for now. Thanks all!

EddyXorb commented 3 years ago

I have the impression that the symmetry breaking part here destroys the optimality // 1. Symmetry breaking for (int i = 0; i < numTasks - 1; i++) { cp_model.AddLessOrEqual(taskToNodeAssignment[i], taskToNodeAssignment[i + 1]); } since this excludes e.g. a node that treats the first and the last task. Or did I understand wrong what the intention of the whole scenario was?

lalithsuresh commented 3 years ago

@EddyXorb yes, the symmetry breaking is a vestige from me simplifying the original problem (in which it is only applied to groups of identical tasks).

The bigger challenge here is that this encoding only allows for objective functions around the maximum capacity of a node. To make this spread tasks across nodes, we still need other constraints that require boolean variables to capture whether a task is assigned to a node.

google / or-tools

How to improve CP-SAT performance for large bin-packing problems? #1799