graphhopper / jsprit

jsprit is a java based, open source toolkit for solving rich vehicle routing problems
https://www.graphhopper.com/open-source/
Apache License 2.0
1.64k stars 605 forks source link

Fix for reproducability of results #506

Closed alistairwoodcock closed 3 years ago

alistairwoodcock commented 4 years ago

Hello, I've spent the past few days investigating a reproducibility problem and I think I've found a solution for it, however the ultimate source is still unknown to me.

Since we're using jsprit in a production system (thank you btw!) I can't publicly share the data I've been using to test this and have had trouble writing unit tests for it. We're still running on jsprit 1.7.2 but I did all my investigation against the master branch.

What we were seeing in production was that the same problem, or near identical problems would sometimes find very different solutions. We had just recently enabled using threads in jsprit so we though this would be the source of the issue. However it seemed only tangentially related.

More than happy to help further the investigation into whats causing this in the first place but I might need some guidance.

Here's the comment from my commit hopefully explaining the issue and how I've decided to resolve it:

Encountered reproducibility issues when running more complex problems against Jsprit.
This issue manifests when the ruinedJobSet is created in RuinAndRecreateModule.
Ordering in the HashSet is not guaranteed. When re-running the same problem occasionally
the order in the HashSet of some Jobs were slightly different. When jobs are shuffled inside
an InsertionStrategy (e.g. BestInsertion and BestInsertionConcurrent) the different ordering
is exhaserbated. The `Collections.sort(unassignedJobList, new AccordingToPriorities());`
ensures that the order of equal items is guaranteed, resulting in different insertions for
different runs of the same problem.

This issue only showed itself when running with threads. The reason for this is unkown.
Notably, when disabling clustersRegret and stringRegret the issue no longer occurred.
The reason for this is unclear but is likely related to the order of items in the unassignedJobs list
returned from the regret classes.

From tests, simply swapping to a LinkedHashSet did not remove this ordering problem so the
unassignedJobs were inserted to a list and sorted. Since an ArrayList is faster on iteration
the ArrayList was kept, however re-inserting the sortedUnassignedJobs may be preferrable.

Let me know if there's any additional changes you'd like me to make or if I can answer any questions!

Cheers

michalmac commented 3 years ago

Nice! I can confirm that after applying this fix, we have been so far able to reproduce all runs.

@oblonski Would be great to have this PR merged.