timnon / pyschedule

pyschedule - resource scheduling in python
Apache License 2.0
295 stars 61 forks source link

Very slow... #4

Closed crayster closed 8 years ago

crayster commented 8 years ago

Hi, Timnon

At first, there is a miss. In pyschedule.py, "apply" is used, and could not run with python 3+.

Another issue seems very fatal: it's very very slow... too slow to use. I'm not familiar with PuLP, perhaps that's because PuLP. Or it's normal, that scale's problem should spend that time... What about your opinion? Is there anything could be improved? Thanks!

I just tested task and resource, even without any constraints.

from pyschedule import Scenario, solvers, alt
import random

def PerformanceTest(rescnt, taskcnt, sharedcnt):
    '''
        Resource count, Task count, Max Shared Resource count
    '''
    S = Scenario('hello_pyschedule',horizon=10000)

    for i in range(0, rescnt):
        S.Resource('R' + str(i))

    for i in range(0, taskcnt):
        S.Task('T' + str(i), len=random.randint(1,10))
        S['T' + str(i)] += alt(S['R' + str(x)] for x in random.sample(range(rescnt), random.randint(1,sharedcnt)))

    S.use_makespan_objective()
    #S.use_flowtime_objective()
    solvers.mip.solve_bigm(S,msg=1)

PerformanceTest(10, 20, 2)
timnon commented 8 years ago

Hi, thanks for pointing out that apply is not part of python 3+, this has been removed, check out the new version 0.2.14

Regarding the speed issue, your example takes 30+ seconds for me. Let me first point out that there is a syntax error, it should be

S.Task('T' + str(i), length=random.randint(1,10))

that is, len should be length. This does not raise an error because a task can have any parameter to allow capacity constraints on that parameter, different story.

After replacing len by length, the tasks really have different lengths, use plotters.matplotlib.plot(S) to visualize this. The solver solvers.mip.solve_bigm also runs then in 1-2 seconds. This is because tasks are very different then, and hence there are fewer symmetries. If you have very similar tasks, then using solver solvers.mip.solve in combination with tasks parameter group is much better.

Note that solvers.mip.solve is very sensitive with respect to the length of the horizon. In your case 50 is more than enough. After changing this, also solvers.mip.solve runs in 1-2 seconds.

It is also possible to limit the computation time, e.g. solvers.mip.solve(S,time_limit=3) for a limit of 3 seconds, but then it is not ensured that a solution will be found.

Could you please specify your time requirements in more detail and maybe the exact problem you want to solve? Happy to take a closer look

crayster commented 8 years ago

Hi, Thank you.

Could you please specify your time requirements in more detail and maybe the exact problem you want to solve?

In fact, I haven't got an exact time requirement yet... just like a feasibility analysis now. My problem is to assist some process manufacturing factories to make their schedules. In these factories:

Actually, such a schedule is usually made by man now, and very difficult to make and maintain.

What I am considering is:

then using solver solvers.mip.solve in combination with tasks parameter group is much better.

Perhaps I misunderstood. It seems tasks in group should have the same length and resouce? And a little confused, which one (solve or solve_bigm) should be used for my problem.

timnon commented 8 years ago

Hi, sounds like a perfect fit for pyschedule:)

One comment, just checking whether a feasible solution for such large problems (1000+ tasks) exists is a hard problem, so there is no cheap win. There is also no easy way to say that some solver is the best, it depends on the time granularity, tasks lengths, ....

There are different methods and/or products to solve such problems, but they all require the input to be formatted in a specific way. The idea of pyschedule is to have a uniform way to formulate scheduling problems, which allows the usage of different solvers by simply switching one line.

If a problem is very large, it is also possible to build iterative heuristics on top of existing solvers, e.g. first schedule the most important jobs, then the second most important jobs, etc.

Yes, tasks in the same group have to be completely similar (same length, same resources, ...). But it is unlikely that all 1000+ tasks are completely different. This information is only used by solver "solvers.mip.solve", because this one can exploit this information. The solver "solvers.mip.solve_bigm" cant use this information.

Do you have a more specific description of some instance you want to solve? Tasks list etc? Required time granularity, e.g. 10min or 30min? Maybe somewhat obfuscated to avoid privacy issues.

crayster commented 8 years ago

Hi, Thank you very much. I think I got the answer. I just thought "if it's fast enough, I don't have to do anything", as you said, I should do something to optimize the problem first. I'll have a try and feedback to you. Please let me keep this issue open for some days.

crayster commented 8 years ago

Hi, Sorry to be late... I'd like to share something and close this issure.

First, as you said, big problems could be really very time-consuming (PuLP takes much time too, and also too much memory). I haven't found any good solution, but to restrict the number of variables and constraints. Splitting problems into independent ones, and compute them one by one, works for me.

However, I also found somethings that pyshedule may be improved I think (For DiscreteMIP).

1. T.length should be considered when create (T, t) A very simple (stupid) case:

def TestBigTask2():
    S = Scenario('TestBigTask2')
    for i in range(100):
        S.Resource('R%d'%i)

    for i in range(100):
        t = S.Task('T%d'%i, length=100)
        t += S['R%d'%i]
        S += t <= 100

    S.use_makespan_objective()
    S.horizon = 102
    if solvers.mip.solve(S,msg=0):
        print("OK")

Current version:

x.update({ (T,t) : mip.var(str((T, t)), 0, task_group_size, cat) for t in range(self.horizon) })

will create (T,0) ... to (T, 101). Obviously, most of them are useless (never =1). If we change to below (of course, revise any other lines referencing (T,t) too)

T_horizon = self.horizon-T.length+1
x.update({ (T,t) : mip.var(str((T, t)), 0, task_group_size, cat) for t in range(T_horizon) })

Only (T,0), (T,1), (T,2) are created. And below could be removed.

# everybody is finished before the horizon TODO: why is this check necessary
affine = [ (x[T, t],1) for T in S.tasks() for t in range(self.horizon-T.length+1,self.horizon) if (T,t) in x ]
cons.append(mip.con(affine, sense=-1, rhs=0))

The result tested on my pc is 38s vs 11s.

2. Then, all "bounds" may be covered by creation of (T,t), not as constraints. Also take the example above. There is an up bound.

S += t <= 100

When create (T,t), if we consider it too, only (T, 0) should be created. And the code about tight up bounds isn't necessary any more. So the idea is, before calling build_mip_from_scenario, if we iterate all tasks and constraints (not only bounds), and compute every task's possible range at first (like the sample codes below), then all codes about "Bounds" could be removed, and the number of (T,t) could be minimized. It works perfect for some cases for me. Example code:

    def build_bound_constraints(self):
        S = self.scenario

        # initialize start_range
        for T in S.tasks():
            if T.start_value is not None:
                T.start_range = [T.start_value, T.start_value+1]
            else:
                T.start_range = [0, self.horizon-T.length+1]

        # low bounds
        for P in S.bounds_low():
            if P.task.start_range[0] < P.bound:
                P.task.start_range[0] = P.bound

        # up bounds
        for P in S.bounds_up():
            if P.task.start_range[1] > P.bound-P.task.length+1:
                P.task.start_range[1] = P.bound-P.task.length+1

        # tight low bounds
        for P in S.bounds_low_tight():
            if P.task.start_range[0] < P.bound:
                P.task.start_range[0] = P.bound
            if P.task.start_range[1] > P.bound+1:
                P.task.start_range[1] = P.bound+1

        # tight up bounds
        for P in S.bounds_up_tight():
            if P.task.start_range[0] < P.bound-P.task.length:
                P.task.start_range[0] = P.bound-P.task.length
            if P.task.start_range[1] > P.bound-P.task.length+1:
                P.task.start_range[1] = P.bound-P.task.length+1

        # lax and tight precedence constraints
        for P in S.precs_lax() + S.precs_tight():
            if P.offset >= 0:
                r_start = P.task_left.start_range[0] + P.task_left.length + P.offset
                if P.task_right.start_range[0] < r_start:
                    P.task_right.start_range[0] = r_start
                l_end = P.task_right.start_range[1] - P.task_left.length - P.offset
                if P.task_left.start_range[1] > l_end:
                    P.task_left.start_range[1] = l_end

        # tight precedence constraints only
        for P in S.precs_tight():
            if P.offset >= 0:
                r_end = P.task_left.start_range[1] + P.task_left.length + P.offset
                if P.task_right.start_range[1] > r_end:
                    P.task_right.start_range[1] = r_end

        # check
        for T in S.tasks():
            if T.start_range[0] >= T.start_range[1]:
                return 0

        return 1

3. Others... A little trivial. I found sometimes, pyschedule spent much time (a few minutes) before PuLP starts. It means functions like build_mip_from_scenario consume it. The below are some cases I encountered:

                affine = [(x[T, R, t_], coeffs[T,R])
                          for T in set(S.tasks(resource=R)) & set(self.task_groups)
                          for t_ in range(t+1-T.length,t+1)
                          if (T,R,t_) in x ]

In this line, if I took 'set(S.tasks(resource=R)) ' outside the loop, it will be much faster.

And,

xKeySet = set(x.keys())
e1 = xKeySet.intersection([(T,R,t) for t in range(*T.start_range)])
e2 = xKeySet.intersection([(T_,R,t) for t in range(*T_.start_range)])
affine = [(x[k],1) for k in e1 ] + [(x[k],-1) for k in e2 ]

is a little faster than

affine = [(x[T,R,t],1) for t in range(self.horizon) if (T,R,t) in x ]+\
                             [(x[T_,R,t],-1) for t in range(self.horizon) if (T_,R,t) in x ]