regret insertion speedup

PGWelch commented 9 years ago

I was having a quick look at the regret insertion code. It strikes me that there's the potential for quite a lot of speed up here (assuming I'm understanding the code properly!), although it may not be guaranteed to work for all cases.

As far as I can tell on each insert you recalculate the regret for each unassigned job (with the exception of the bad jobs that can't be inserted anywhere). So you've got two loops, the 'choose next job' loop and inside that, the 'loop over each unassigned job' (which in-turn loops over all routes). For the vast majority of problems you should be able to cache the results of 'loop over each unassigned job' and re-use them on the next iteration of 'choose next job'. If you cached the best insertion in each route for each unassigned job, then when a job is inserted in the 'choose next job' loop, you just remove the cached records for all the unassigned job in the route that has just been inserted into, but keep the rest. Does this make sense?

It won't work for problems where you have some kind of odd constraints between routes, but for the vast majority of problems it should work - suggesting it should be an optional speed-up or similar?

It may also be worthwhile to implement an option where the regret is only calculated on the first iteration of 'choose next job' and not afterwards, maybe scoring regret based on number of available routes or similar? This would work in problems where the skills or capacities mean some jobs only have a small amount of available routes (regardless of what's already loaded onto the routes).

oblonski commented 9 years ago

Hmm. Nice. It is really worth to discuss speedups of regret insertion.

The problem - I see - with your suggestion is that to calculate the score I need the costs of the best and second best insertion. Once I insert a previously unassigned job into a route then either the best or the second best positions of all other jobs that are still unassigned can change, even for unassigned jobs that neither have the best nor the second best insertion position in that route. And this is why I update the scores again and again. How would you update these scores when caching best insertions?

PGWelch commented 9 years ago

Couldn't you just cache the insertion cost for each unassigned job in each route? Then when you insert a job into a route A, you recalculate the insertion costs for all remaining unassigned jobs for route A only, and you reuse the cached costs for all other routes? You then recalculate the score, but you recalculate it using the cached insertion costs + the updated ones for route A. So you're correct that you always have to recalculate the score after each insertion, but you don't have to recalculate the majority of insertions as well.

oblonski commented 9 years ago

@It may also be worthwhile to implement an option where the regret is only calculated on the first iteration of 'choose next job' and not afterwards

Can you give me some information about this? Implementing this is no problem at all. We might also configure this as another insertion strategy. However, you pointed to problems with skills and few alternatives for some jobs to be inserted, but I cannot yet see how this would help.

oblonski commented 9 years ago

Ahhh. Nice. That is convincing :+1: and will impact regret insertion significantly.

PGWelch commented 9 years ago

It may also be worthwhile to implement an option where the regret is only calculated on the first iteration of 'choose next job' and not afterwards

I had someone emailing recently with a problem where only one of their vehicles had enough capacity for a job and as regret wasn't turned on, the job typically didn't get loaded - as bestinsertion uses randomised order. Similarly if someone was using skills to lock down jobs to certain vehicles, you'd get the same problem.

Obviously regret solves this, but at some computational expense. Using a single 'regret score' at the start of the recreate would also solve this though and be less expensive. You'd use the single regret score - or perhaps just the number of available routes, to order the jobs to be inserted (where jobs with only 1 available route come first). If you used the number of available routes you could also still randomise the order for jobs with the same number of available routes (as randomisation is a good thing for the search).

oblonski commented 9 years ago

Ok. I understand. Here, we should at least generalize the regret scorer such that users can take this into account. If you have a concrete scorer that improves this in a number of cases (and that might also be used along with the current scorer), let me know and I will integrate it.

Keep up sharing your experience here. This helps a lot improving jsprit.

oblonski commented 9 years ago

Upps. Implementation is not as easy as I thought since if one chaches best insertions, one also caches concrete vehicles. Thus, best insertion positions are dependent on available vehicles. For example, assume you have two routes (r1,r2), two vehicles (v1,v2) and two unassigned jobs (j1,j2). Iterating over these jobs result in the following best insertion positions (route,vehicle,score): j1 -> (r1,v1,12) and j2 -> r2,v1,10). Thus, you choose to insert j1 into r1 with v1. To update the score of j2, you need to update the insertion costs of the changed route r1 yielding j2 -> (r1,v2,8). Since, 8 < 10 you choose j2 -> (r2,v1,10). Thus, you try to insert j2 into r2 employing v1 EVEN v1 has already been assigned to r1. ;(. Thus, actually since v1 is not available anymore, you need to update all insertion data that uses v1. Another issue is that for example the share of fixed costs I add to the insertion costs of a new job is dependent on the number of jobs that still need to be inserted. Thus, it changes with each inserted jobs requiring all other insertion costs to be updated. Difficult ...

PGWelch commented 9 years ago

I don't currently understand the vehicle-route separation logic, but could you cache using a key which contained the route-vehicle-job combination? Maybe even use a cache like this https://github.com/PGWelch/com.opendoorlogistics/blob/master/com.opendoorlogistics.core/src/com/opendoorlogistics/core/cache/RecentlyUsedCache.java (feel free to copy this class) which could store them over multiple ruin and recreate iterations? (you'd have to use the ordering of the stops in the cache key as well - i.e. use the actual route in the key). On the other issue, are these other costs independent of insertion position in route (or even independent of the route itself?). Could you just add them to the cached result? Its the insertion and associated constraint checking that you don't want to do repeatedly and needs caching - particularly as at the moment (I think) if you have n routes in your current solution, regret is doing n times more calculations then it needs to.

oblonski commented 9 years ago

ok. I just calculated a few statistics on redundant calculations. when it comes to regret insertion one could really save a lot. however, generating hash codes for routes, mapping and retrieving cached results do not come with no costs. i would really prefer an array based cache, but this does not seem to be possible. what kind of hashcode do you assign to your routes to identify them? i tried a hash code that is dependent on the vehicle(type) and on the activity sequence, thus every time an activity is inserted or removed a new hashcode is generated (by looping through all activities).

PGWelch commented 9 years ago

i tried a hash code that is dependent on the vehicle(type) and on the activity sequence, thus every time an activity is inserted or removed a new hashcode is generated (by looping through all activities

This seems a sensible way to go, but if I'm being honest I don't understand enough about the intricacies of your insertion code to be able to say what's the best way to implement this.

Alternatively instead of hash-based caching you could just store the last insertion results for each vehicle route, where this object would store the cost for each job (as a double[max_job_index+1]) and a copy of the currently loaded activity sequence on the route. Next time you do the insertion for a job onto this route, you check if the activity-sequence is the same and clear the object if not? You can then probably do a one-off allocation of the entire structure as fixed arrays, at the start of the optimisation?

oblonski commented 9 years ago

ha ha ha. @PGWelch I finally managed to redesign regret to avoid redundant calculations. it is a huge impact on performance (up to 3 times faster). jippi. :+1: will merge it into master asap.

PGWelch commented 9 years ago

Congratulations! Would it work on problems where the insertion feasibility onto one vehicle is dependent on the state of other vehicles? - for example if you had a maximum capacity constraint at a depot? I'm assuming it wouldn't - in which case I'm thinking it should be activated by default but could be switched off? Problems like this are uncommon but not unknown.

oblonski commented 9 years ago

Ahh good point. Thank you. No, it would not. But I assume that it is not so difficult anymore to add a mechanism that updates all routes affected by an insertion (but only these). For the time being, I just leave the old regret strategies in the master branch.

oblonski commented 9 years ago

Hmm. If you have any additional dependency between two jobs then this speed-up wont work in its current version. I somehow need to inform the insertion about job dependencies to trigger addtional cost calculations (beyond just for the route that has changed). Will come up with a solution asap.

PGWelch commented 9 years ago

What about checking for the read of states not associated with the route being inserted into (either the state for another route or jobs on other routes), which are accesssed during either the get insertion cost or the subsequent updating when a job is inserted?Alternatively I think it would be ok to just ask the user (i.e. client code) to flag in some way if their routes are interdependent and do the full recalculations (as in the current master version) if so. Problems with interdependent routes are rare - so it may not be worthwhile implementing a more complex solution for them.

     ---- On Tue, 20 Oct 2015 07:55:17 +0100  notifications@github.com  wrote ----Hmm. If you have any additional dependency between two jobs then this speed-up wont work in its current version. I somehow need to inform the insertion about job dependencies to trigger addtional cost calculations (beyond just for the route that has changed). Will come up with a solution asap.

—Reply to this email directly or view it on GitHub.

oblonski commented 9 years ago

Thanks, Phil, for your suggestions. On the another project I work with arbitrary dependencies such as that n jobs need to be in the same route, in sequence or in direct sequence, and I had to realize that the new speed up broke my unit tests there (but unfortunately no red light in jsprit unit tests). I am about to add some of these tests to jsprit so that this does not happen in future again. Furthermore, I had to realize that even n jobs need to be in direct sequence in one route, once one of these jobs has been inserted the insertion costs of all other jobs need to be recalculated again. I will probably do it as you suggested, I somehow add a method to let the user specify (explicitly) that two jobs are dependent on each other.

Am 20/10/15 um 09:14 schrieb Philip Welch:

What about checking for the read of states not associated with the route being inserted into (either the state for another route or jobs on other routes), which are accesssed during either the get insertion cost or the subsequent updating when a job is inserted?Alternatively I think it would be ok to just ask the user (i.e. client code) to flag in some way if their routes are interdependent and do the full recalculations (as in the current master version) if so. Problems with interdependent routes are rare - so it may not be worthwhile implementing a more complex solution for them.

---- On Tue, 20 Oct 2015 07:55:17 +0100 notifications@github.com wrote ----Hmm. If you have any additional dependency between two jobs then this speed-up wont work in its current version. I somehow need to inform the insertion about job dependencies to trigger addtional cost calculations (beyond just for the route that has changed). Will come up with a solution asap. —Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/jsprit/jsprit/issues/161#issuecomment-149458554.

PGWelch commented 9 years ago

Sounds good - the user would need to be able to specify when two vehicles are interdependent as well though. e.g. capacity constraints at the depot level would be a vehicle to vehicle dependency, not job to job.Job to job dependency problems would probably require custom heuristics anyway (e.g. periodic problems).

     ---- On Tue, 20 Oct 2015 08:26:19 +0100  notifications@github.com  wrote ----Thanks, Phil, for your suggestions. On the another project I work with

arbitrary dependencies such as that n jobs need to be in the same route, in sequence or in direct sequence, and I had to realize that the new speed up broke my unit tests there (but unfortunately no red light in jsprit unit tests). I am about to add some of these tests to jsprit so that this does not happen in future again. Furthermore, I had to realize that even n jobs need to be in direct sequence in one route, once one of these jobs has been inserted the insertion costs of all other jobs need to be recalculated again. I will probably do it as you suggested, I somehow add a method to let the user specify (explicitly) that two jobs are dependent on each other.

Am 20/10/15 um 09:14 schrieb Philip Welch:

What about checking for the read of states not associated with the route being inserted into (either the state for another route or jobs on other routes), which are accesssed during either the get insertion cost or the subsequent updating when a job is inserted?Alternatively I think it would be ok to just ask the user (i.e. client code) to flag in some way if their routes are interdependent and do the full recalculations (as in the current master version) if so. Problems with interdependent routes are rare - so it may not be worthwhile implementing a more complex solution for them.

---- On Tue, 20 Oct 2015 07:55:17 +0100 notifications@github.com wrote ----Hmm. If you have any additional dependency between two jobs then this speed-up wont work in its current version. I somehow need to inform the insertion about job dependencies to trigger addtional cost calculations (beyond just for the route that has changed). Will come up with a solution asap. —Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/jsprit/jsprit/issues/161#issuecomment-149458554.

—Reply to this email directly or view it on GitHub.

oblonski commented 9 years ago

BTW: Currently, my favourite vehicle-to-vehicle dependency is that two drivers need to meet each other at a specified time and location :). Still thinking hard about how this can be easily modelled and implemented.

PGWelch commented 9 years ago

That's a difficult one to solve. Does the optimiser need to decide the meeting time and/or location or can you just use two shipment with the end of the first one set to meet the start of the second one in both time and space? (Same location, narrow time windows - e.g. 10 minutes, set the same). Shipments obviously set so they can't be on the same vehicle.

graphhopper / jsprit

regret insertion speedup #161