meerk40t / meerk40t

Hackable Laser software for K40 / GRBL / Fibre Lasers
MIT License
227 stars 62 forks source link

[Sticky] Optimisation ideas #596

Closed Sophist-UK closed 2 years ago

Sophist-UK commented 3 years ago

This issue is raised as a repository for ideas about how to speed up optimisation.

Sophist-UK commented 3 years ago

User provided examples of SVGs for point 1. Downloads.zip

tatarize commented 3 years ago

As discussed in IRC. 1) Better code to test the distances of things would make a huge difference a non-brute algorithm. 2) We should abort the search if we find an item 0 away (we can't get any closer than an overlapping point). 3) We should check the segment after the current segment first if backwards==true and before the current segment if backwards==false. (cycling within the group when at the first or last segment since we're dealing subpaths usually)

Changes 2&3 could be done independent of 1 and are somewhat considerably minor. In fact, we wouldn't even need prevent double checking them since the speed savings for the typical case is sure to outstrip the other unusual case.

tatarize commented 3 years ago

613 was merged. This implements the subset I put forward there.

tatarize commented 3 years ago
  1. "I would imagine that we can reduce many sets of short straight line segments to bezier curves, and potentially reduce simple forms of bexier curves to arcs and simple forms of straight arcs to lines."

This is unneeded. We use zingl-bresenham curves. We don't subdivide cuts into straight line segments to begin with. We merely draw the curves within our native plot sequence. It's much better to do this way. But, unless the user has divided their curve that way, the curve will be fed into the plotplanner and decomposed into octagonal and diagonal moves, and this will pixel perfect plot the cut. It also makes it scale invariant.

  1. The power of the sorting there is somewhat negated this is a NLog(N) process to sort the list and it's an N process to merely look at things. You can't do that faster without storing the proper values in a more useful manner. There's a lot of options for solving K cases of find the closest point to point Q.

  2. Numpy isn't really a requirement. This would make it a requirement.

  3. CuPy isn't a requirement, this would make it a requirement.

  4. Limited the checked points is the method that all solutions to nearest point problems employ. This is usually done by some manners of chunking the points into clusters and determine since no point is nearer to this point than the edge of the cluster the fact that the point is in that cluster proves it cannot be closer so do not check that point cluster. I've coded my own novel algorithm for this which I feel is a lot easier.

    • Sort all candidates into two lists, sorted by x coord and y coord. (Occurs 1 time, O(NlogN)).
    • Find position of current point in the sorted lists. (log(n), binary search)
    • Iterate within both list in the +x, +y and -x and -y direction. Storing the current best solution.
    • Stop search in that direction if and when the current closest point is closer than the distance in only the one coordinate. (Since if dx > d_curr, it doesn't matter what the y-value is all candidates in that direction disqualified since they must be further away).
    • The algorithm is log(n) for finding a point within the field. So we're doing this for N points or so gives us NLogN, as the overall solution. The current solution is N^2, your proposed resort each time thing is N^2log(N).
tatarize commented 3 years ago

See #617. I hacked in two-opt as an algorithm for cutcode optimizations. It would take a pretty critical set of rethinks to make it work for the methods we have given the limitations of inner first (and maybe others in the future). But, you can at least see how it should work if it was a thing, optimization-wise.

Sophist-UK commented 3 years ago

On the initial numbering scheme:

1. Stay on path

Implemented

2. Simplify large numbers of short PathSegments

This is unneeded. We use zingl-bresenham curves. We don't subdivide cuts into straight line segments to begin with. We merely draw the curves within our native plot sequence. It's much better to do this way. But, unless the user has divided their curve that way, the curve will be fed into the plotplanner and decomposed into octagonal and diagonal moves, and this will pixel perfect plot the cut. It also makes it scale invariant.

My suggestion for reducing many short PathSegments to a fewer number of Bezier curves was to reduce the time needed to optimise, not for improving cut time. With hindsight I think my brain-dump suggestion to change segments to a simpler type was unnecessary.

However, simplifying large numbers of short PathSegments to significantly fewer PathSegments at load time (optional setting to enable this) might be beneficial at optimise time. And of course these would be preserved if you did a save, so potentially the overhead of reducing PathSegments would only happen on raw svgs.

3. Python sort

  1. The power of the sorting there is somewhat negated this is a NLog(N) process to sort the list and it's an N process to merely look at things. You can't do that faster without storing the proper values in a more useful manner. There's a lot of options for solving K cases of find the closest point to point Q.

I have tried this - and as you suggested it would be, it was actually slower.

4. / 5. Numpy etc.

  1. Numpy isn't really a requirement. This would make it a requirement.
  2. CuPy isn't a requirement, this would make it a requirement.

I was not intending to suggest that we should make them a requirement - but should take advantage if they are available.

6. Clustering

Limited the checked points is the method that all solutions to nearest point problems employ. This is usually done by some manners of chunking the points into clusters and determine since no point is nearer to this point than the edge of the cluster the fact that the point is in that cluster proves it cannot be closer so do not check that point cluster. I've coded my own novel algorithm for this which I feel is a lot easier.

SVGs with groups may already have information that can be used for a sort of clustering. We could potentially do some optimisation based on Groups.

But I think you are right - the O() of the algorithm is more important than parallelisation because for large SVGs, O(NLogN) or O(N^2) will quickly overtake whatever parallelism you can achieve on CPU or even GPU.

  • Sort all candidates into two lists, sorted by x coord and y coord. (Occurs 1 time, O(NlogN)).
  • Find position of current point in the sorted lists. (log(n), binary search)
  • Iterate within both list in the +x, +y and -x and -y direction. Storing the current best solution.
  • Stop search in that direction if and when the current closest point is closer than the distance in only the one coordinate. (Since if dx > d_curr, it doesn't matter what the y-value is all candidates in that direction disqualified since they must be further away).
  • The algorithm is log(n) for finding a point within the field. So we're doing this for N points or so gives us NLogN, as the overall solution. The current solution is N^2, your proposed resort each time thing is N^2log(N).

Yes.

If we are doing pure python where all algorithms are possible, then this would seem to be optimal. But using e.g. numpy where numpy has some things it does in C in parallel, but some things that need to come back to single-threaded python for processing, O() still trumps, but it may still be faster to have a slightly less optimal algorithm (which is still O(NlogN)) but which utilises specific numpy features.

6. Clustering as Greedy / 2-opt & 7. Greedy look ahead

Greedy seems to me to be a form of clustering too.

In my experience based on the types of svg file I use, greedy does a great job on a pareto basis - you get 90% of the benefit from 10% of the effort.

But I have specific examples in mind where greedy's simplistic absolute closest results in choosing travel moves which to human visual recognition looks sub-optimal i.e. you travel to the 2nd stitch hole in a long line of stitch holes, then go to 3rd etc. and eventually have to come back to do the 1st hole. To a human it is visually-obvious that it is better to go to the 1st stitch hole now even though it is slightly further away in order to avoid a long travel later. (This is still way better than cutting in svg-file order however.)

So my thinking was:

a. How can we algorithmically code for those occasional situations that humans would see as visually obvious and resulting in later long travel times without creating a huge overhead?

b. Can we do 2-opt on a 90/10 basis by only trying to convert the really long travels into much shorter travels?

tatarize commented 3 years ago

2-Opt is often visually pleasing since users can't see a better method since sometimes there is no better method (though you can't prove it since that's a NP problem).

Greedy isn't a form of clustering since you search everything and nothing cuts off the O(log(N)) of the number of values that need to be checked. Though I'm fairly sure there's a reasonable solution to solving greedy with numpy. It's entirely possible to speed it up by still being greedy without being brute. The current algorithm is greedy-brute force. This could be reduced to greedy-log(n) type algorithms which would go a lot faster.

I think for now it might be enough to offer 2-opt in parallel to the other optimization algorithms, defaulting to "no algorithm" if numpy is not installed, and 2-opt is requested.. Might simply be a checkbox in the optimization settings, that prevents the use of inner-first since we can't do both.

This would give options for users who don't care about inner first but might care about having their work cut much faster. Also, it might be possible to avoid merging cut operations and to exclude different levels of cut, to force it to cut inner first regardless how the individual bits of cut-code are optimized. You merely give the cuts that don't contain anything first, then the cuts that contained only paths that didn't contain anything then the cuts that only contained paths that contained paths that contained nothing. After about the 3rd level this should peter off, but each individual chunk of cutcode could be totally optimized with 2-opt without any issue. So you'd have rasters, engrave, cutcode-cut-level1, cutcode-cut-level2, cutcode-cut-level3 and all of these could have 2-opt applied to them without any issue.

Sophist-UK commented 3 years ago

I think that greedy is a form of clustering in the sense that a lot of the time the travel moves are likely to be optimal, but a few are going to be suboptimal. Whether that sort of grouping leads to reduced 2-opt times is unclear to me - but I suspect that compared to starting 2-opt from a random sequence, it results in very much a non-random distribution of travel move distances, and that the 90/10 basis of 2-opt might be able to be helped along by focusing 2-opt swaps starting with the longest travel move.

I will give a numpy version of greedy a try - and if initial tests show it to be faster than python then I will try to make it completely compatible with the existing Python greedy algorithm both in terms of exactly matching functionality and falling back to python if numpy is not available.

I don't think I want to go on the complete learning curve for how MK works and 2-opt and numpy to the level needed to code 2-opt myself, but if you would like me to I would be happy to go as far up that learning curve as needed to review your code and see if an alternative perspective can offer any ideas for speeding it up.

As for implementing 2-opt, even if it can only be applied if certain optimisations are not selected, that would still be better than not having it at all IMO.

P.S. I cannot really express just how much time basic Travel optimisations saves me either in svg prep or in lasering. Basic travel optimisation saves at least 90% of the travel time - I don't use Merge Operations or Merge Passes because they cause burns of the same short paths several times in a row without cooling time in between (and there are still some funny oddities (bugs??) with Merge Passes (not tried Merge Operations) whereby it doesn't go around and around a closed path but instead does each segment back and forth) - and in any case these don't save a huge amount of extra time over and above basic travel optimisation.

With 0.6.2x I used to have to manually ensure that the leather stitch holes were in a good sequence for burning (which took a lot of my time) or wait for (almost literally) hours whilst the laser head travelled to and fro between the holes in the pattern in the order that they happened to be in the PDF file for a paper pattern. Most of the patterns we use have paths which have only a few segments - it is only recently that we have seen patterns with hundreds of segments in the paths, hence the recent focus on improving that area. But the optimisation functions have saved me certainly 10s of hours of my time and possibly hundreds of hours.

I can't speak about Lightburn, but IMO there really is now no comparison between Whisperer and Meerk40t. (And I am approving loads of new members on Luke's Meerk40t FB Group every day.)

tatarize commented 3 years ago

I think that greedy is a form of clustering in the sense that a lot of the time the travel moves are likely to be optimal,

Clustering in context refers to when you cluster euclidean space to solve greedy faster because you can use clustered algorithms to solve nearest point problems faster than O(N) and lets you solve greedy faster than O(N^2).

Numpy greedy would be brute force. So rather than O(N*log(n)) we're doing O(N^2) but trying to make up for the difference with the raw speed of the combined operations in the C code. -- The bigger issue is that it's not constrained either. Or would need to be constrained rather than just search the set of all endpoints. You need to search the set of all endpoints which occur from within the inner first candidate list.

http://honors.cs.umd.edu/reports/kuang.pdf

I think we could actually make a 2-opt that works for our HTSP problem. We drop invalid nodes and proceed to perform a cheapest valid insert, probably in a brute force manner.

https://stemlounge.com/animated-algorithms-for-the-traveling-salesman-problem/

So in theory a correctly done numpy solution could help here.


Though at a certain point we're going to have way more trouble with the polygon in polygon algorithm timewise than the optimizations here.

tatarize commented 3 years ago

We might also check whether any constraints exist, and default to this algorithm in the case that they don't. It also might be better to number the cutobjects with their original paths index and define the constraints independently within the cutcode rather than use groups. So anything that has the same number is from the same path. And we define constraints like 2 constrains 3 which means that 2 must finish before 3 may cut, and we include the count of each cutobject indexes.

While this would simply upgrade our methodologies here we could in theory have some additional speed up allowed if we didn't have to search all the cutgroups for non-cut entries since the cutgroup would contain paths of the same index and we'd know how many paths we've already gathered that match that and thus how many are still outstanding, and thus whether the constraint there is satisfied.

These sorts of definitions would be critical if we switch to a HTSP solution for the inner first options. These also might be really useful to let users define a required ordering for different paths. Since we could still optimize things we'd just do so within the constraints, so a good datastructure to define constraints would be extremely useful. Since we could define a constraint tree and more easily tell if we correctly met those constraints. And if we mix and match with cheapest insert (with constraints) we could optimize our paths much more effectively.

Though it might also need some secondary criterias too like the ability to weigh direction changes within the project negatively. So if you reverse direction on a dime, that counts against you. So going right then exactly left, means that that path takes a functional hit in the metric. So rather than pure distance we also can put our finger on the scale and optimize things according to other criteria too.

Sophist-UK commented 2 years ago

-This is NOT completed - nor will it ever be as there will always be further potential optimisations to consider.

However, I would like you to confirm that my image / raster optimisation that is in 0.7 has been ported forward to 0.8 (if you cannot determine the margins from the speed, then a fixed margin of (say) 500mil would probably do).

And I am pretty certain that some of the above suggestions have not been implemented yet - like use of numpy for my suggestion of greedy and your suggestion, nor e.g. a new algorithm I have in mind for 2-opt etc.