google / or-tools

Google's Operations Research tools:
https://developers.google.com/optimization/
Apache License 2.0
11.27k stars 2.13k forks source link

Merge CP-SAT models for parallel insert of intervals. #3443

Closed lucafesta closed 2 years ago

lucafesta commented 2 years ago

Python 3.8 / CP-SAT: ortools==9.3

I'm facing a time issue in a jobshop scheduling problem due to an huge number of intervals and constraints. As for I have written the problem, the very big part of the time is spent in the creation of "NewIntervalVars" and various constraints on them. This part could be parallelized because the interaction between intervals is made all in the end of the program. First I try to use a multiprocess on the same model instance, and I had no improvements (probably due to a massive access of the same memory space?). Then I decide to create a new model for each thread and then concatenate them all toghether using the protobuf:

text_format.Parse(str(model), new_model.Proto())

This solution resolve the time issue, but the final model doesn't work correcly: in fact I'm not able to access to the variable merged in the final model and create constraints that use them all.

In the example below there is a code that replicates what I'm trying to do and where it fails.

There is any way to access the variabile in the model using the name (like model.something("variable_name") ) and use it to create the new constraints? Or there are other possible solutions?

from google.protobuf import text_format
from ortools.sat.python import cp_model
import concurrent

class FunctionReturn():
    def __init__(self, model, el, name):
        self.model = model
        self.el = el
        self.name = name

    def __call__(self):
        return self.model, self.el, self.name

# Creation of models that contains the problem part that can be parallelized
def make_variable(x):
    from ortools.sat.python import cp_model
    model = cp_model.CpModel()
    el = model.NewIntVar(0, 10, x)
    constraint = model.Add(el < 5) # Contraint in-model that works
    results = FunctionReturn(model, el, x)
    return results

if __name__ == '__main__':
    new_model = cp_model.CpModel()
    all_el = {}
    processes = []

    executor = concurrent.futures.ThreadPoolExecutor(5)
    futures = [executor.submit(make_variable(args)) for args in ["a", "b", "c", "d", "e"]]
    concurrent.futures.wait(futures)

    # Merge all models toghether
    for future in futures:
        model, el, name = future.result()
        text_format.Parse(str(model), new_model.Proto())
        all_el[name] = el # Here the issue! This memory area are related to the process-models instead of the final model

    # Here I need to access to variables "a" and "c" of new_model, because the "all_el" variable refer to the process-ones
    # In fact adding this constraint make the solution not feasible
    new_model.Add(all_el["c"] < all_el["a"])

    new_model.Maximize(sum(all_el.values()))

    solver = cp_model.CpSolver()
    status = solver.Solve(new_model)

    print(solver.StatusName(status))
    print(solver.ObjectiveValue())
lperron commented 2 years ago

just switch to C++. Python protobufs are very slow and not MT safe.

lucafesta commented 2 years ago

@lperron There is another way to make parallelization in python or an example of the C++ code with a pywrapper in python?

lperron commented 2 years ago

Look at pybind11. Examples are in ortools/graph/python