[0.8.x] Missing raster optimisation functionality

Sophist-UK commented 1 year ago

As reported on 13 Oct 2022 in #1372 , the Raster optimisation code which breaks raster operations up into separate non-overlapping CutCode is still missing from v0.8.

This issue is not primarily to remind the devs that this functionality is still missing but to propose that when the code is ported the following changes are made:

Algorithm: Original code did some complex stuff for deciding when to split and when to combine, and had some complex code to calculate the margins based on acceleration etc. With the benefit of experience, this is IMO over complex. I would therefore propose the following:
Margins for determining overlap: When determining whether raster elements should be grouped together into the same CutObject use either:
1. Galvo laser (zero acceleration issues): a small fixed margin. (No difference in time for having separate rasters - fully compatible with Image ops.)
2. Group inner burns checked: a fixed additional margin equal to the operation overscan distance; or
3. Group inner burns not checked: a fixed 1/4" or 60mm distance. Whilst this is not going to be theoretically optimal, in reality the differences in time between joined or separate burns for two sets of raster elements that are marginally either side of this distance will not be noticeable and IME having separate burns feels better if there is little difference in burn time and this is more compatible with how Images ops burn.
The benefit of this simplified approach (aside from simpler code) is that it no longer relies on calculating the acceleration distance / time to determine separation, and so can be applied to all lasers.

Sophist-UK commented 1 year ago

Just to give you an idea of the time difference this optimisation can make in a real-life burn (that is underway as I type this)... The first line is two vertical text rasters with a big gap between them. The 2nd and 3rd lines are the same two vertical text rasters done separately. As you can see this optimization can make a BIG difference to the rastering time which would be less than 50% of the time if this optimisation were reimplemented.

Sophist-UK commented 1 year ago

Here is a short video demonstrating just how much time (estimated @ c. 50%) would be saved by having raster optimisation reimplemented.

Rastering is time-consuming, so time savings are non-trivial.

https://github.com/meerk40t/meerk40t/assets/3001893/3583de9a-ea08-43d7-9df5-c0b2bb21d51d

Sophist-UK commented 1 year ago

Doing a multi part raster now. It literally takes more than twice as long to burn without this optimization than with it! Can you please, pretty please, or this optimization back in!!

Edit: It literally went from 30 mins to 13 mins when I manually split the rasters into separate ops in order to force them to be rastered individually.

tatarize commented 1 year ago

This code doesn't really belong necessarily in optimizations. If we can run the algorithm to divide up the rasters into coherent segments of rasters we should be able to do this merely within a single console command.

image raster-optimize

And have it take the original image and divide it into different offset images that correctly can be placed in an op image and be rastered individually.

This simplifies the problem greatly and can be applied as a pre-processing step for the cut planner without needing to touch or mess with anything else. We basically just cut an image into pieces algorithmically. In this context, how did the origin 0.7.x functionality work? --- I never actually understood it and thus couldn't extract it from the tendrils of the rest of the code.

I assume we're processing the image and probably looking for any lines where we can, move from the bottom of the image to the top of the image, consisting of null-pixels and if we find one, we could slice the image there. And if we find fat enough we go ahead and make that particular cut. And could do the same thing for vertical beams or even corner beams. Or we could probably do something more akin to k-means clustering and just apply a clustering algorithm to all the pixels and slice those clusters into image sections. We can also, do things like take a mask of the null pixels and search for a n-wide section that goes through the entire image, and if, found, go ahead and do a purely horizontal flood fill to slice the image there. Maybe even allowing you to cut a frame even though there are no pixel gaps in that. Since if we go ahead and bite the bullet there, we'd save significant time.

Sophist-UK commented 1 year ago

That is not user friendly! In 0.7 I pressed the Go button and rasters were optimized - I didn't need to press a button to achieve it. This belongs in optimizations exactly the same amount as any other optimization - perhaps it needs a separate checkbox, though I don't see any downsides to this optimization and more nor less than e.g. a vector path order optimization, so if this needs a separate checkbox then so does vector path order.

Also, switching it to an image when you run this command goes against the "save-settings and process at burn time" approach in 0.8 that was so beneficial - so for example if you run this command you can no longer scale the SVG paths that make the raster, and if you save it you will lose the SVG paths entirely. If you are going to take this approach, then IMO what it should do is to work out what rasters are not overlapping, and then put the paths into separate Raster Ops. You can then still scale as much as you like.

That said, having stated the case for having this optimisation run when you do all the other optimisations at burn time, a button that divides the rasters into separate optimizations might be useful if e.g. you decide you need to have different settings for each area.

Sophist-UK commented 1 year ago

how did the origin 0.7.x functionality work? --- I never actually understood it and thus couldn't extract it from the tendrils of the rest of the code.

It worked as follows:

At the point that the elements in each op are processed for a burn i.e. right at the beginning of the burn code, the code that processes the elements for the raster op does a pre-process step whereby it groups the rasters that overlap each other (plus a margin).

It starts with creating a list of raster groups, with one raster element per group. Each group holds the overall bbox and the containing raster elements. It then iterates and when it finds two groups where the group bbox overlaps, it merges the two groups and recalculates the overall bbox using min/max.

Once it has iterated to the point that no more merges happen (and you will need to consider carefully how to make these iterations most efficient and yet conclusive), then each group is turned into a separate image / cut-object.

The 0.7 version added a margin to account for the acceleration / deceleration time - which is possible to calculate for the m2 nano but not for other boards, and my experience with 0.7 suggests that this optimisation is not worth the effort.

But you might want to add a small fixed distance margin around the bbox so that e.g. rasters which are a few pixels or a few tens of pixels apart are rastered together (mainly because they might well be part of the same graphic but also because the optimisation time-savings will be small.

There is a potential further optimization possible - whereby before you merge groups you iterate the elements inside them to see if their individual bboxes overlap. In the above diagram, the original algorithm would produce one raster image, and this enhanced optimisation would produce 2. Whether this enhanced optimisation will save time will depend on the gap etc. and you might need to consider counter examples i.e. to merge anyway when the group bboxes do overlap and the raster direction (e.g. usually X) of one element is entirely contained in the other even though they don't overlap (because in this situation the head is already sweeping over this area of the material - image a square inside a frame made entirely of small overlapping squares).

My gut reaction would be to have the basic algorithm automatic and the advanced algorithm selectable as an experimental option - and then solicit feedback on whether there are bad optimizations from the advanced usage.

jpirnay commented 1 year ago

Bear with me, I have never really had a close look at this subject, but judging from my own experience : I tend to have rasters as small as possible not as large as possible to overcome the white space move issue. You can easily combine two overlapping rasters that have a lot of white space in between them. So I fail to understand the benefits.

jpirnay commented 1 year ago

If you look to this example, then wouldn't you end up with the infamous picture frame scenario? grafik

Sophist-UK commented 1 year ago

The benefits are simple to demonstrate with two squares a long way apart where the head spends a LOT of time crossing empty areas and it would be reasonably obvious to be faster rastering the two squares separately: They are along way apart because there is a lot of stuff in the design that I omitted that means the rasters have to be a long distance apart and the rasters have to be in the X-direction (T2B) because the design has to be in that orientation to fit the bed.

Jens' diagram above certainly illustrates that this is a non-simple thing to optimise in a generalised way. In the above diagram, it would currently burn as a single image, and in the simple algorithm it would also still burn as a single image, and in the advanced algorithm it would still burn as a single image. But my gut reaction is that for this specific combination of elements each path should be burned as a separate image (but clearly you cannot use this as an algorithm because you cannot burn every individual element as a separate image if the paths actually overlap). But (depending on the relative heights of the top corner boxes and the top horizontal bar) it might be better to do the top 3 elements as one image, the bottom 3 as another and the two vertical bars as as individual items.

I guess you could fix this by changing the advanced algorithm to do a VectorMontonizer check whether any of the paths in the first group overlap with any of the paths in the second group, but then you are increasing the iterations substantially. If this is Numpy-able, then that might counter the performance hit.

However providing that the algorithm doesn't break a burn (by breaking actually overlapping elements into separate images), then if it optimises some cases and doesn't optimise others then it is still beneficial.

So the practical answer would be to start by reimplementing the bbox algorithm and optimise the majority of cases where the bboxes don't overlap, and leave those where they do (like the above diagram) unchanged. The code for the simple algorithm still exists in the Git history, so reimplementing it (and removing or simplifying the margin code) shouldn't be nearly as hard as writing it from scratch . The advanced algorithm would be only a minor tweak to the code - from memory it is handing the creation and merging of the groups which was the tricky bit.

Then at a later stage if there is any interest you could try the VectorMontonizer alternative and see what the performance might be and maybe give it as an advanced option for the user to choose. (Personally I would be completely happy with the bbox algorithm and I am not advocating that the VectorMontonizer is worth implementing.)

tatarize commented 1 year ago

Yeah, ultimately grouping objects into rasterable groups by their overlap seems like it would be suboptimal, but kind of low hanging fruit. I do see that the advantage is since the bounding boxes do not overlap it is the case that these sections would be correct. Since, if the bounding boxes overlap, we could have two objects that are expected to occult or run the raster a single time whereas putting each object in a different group would certainly have issues.

I'm not sure the small amount of buffer overlap accomplishes much of anything. It seems like we could classify raster-clusters by overlap alone without really caring too how deeply they overlap.

Also, there's some extra advantage in that we don't need to generate a lot of pixels that we don't use either. Since the two small box example will only generate the two small boxes of pixels and none of the pixels between them.

I also in the #2235 I tried to manually allow this for images which are notably not rasters. Basically in theory tool imagecut would give you a line drawing ability that would slice images into multiple pieces. Ultimately this would be needed to do things like create puzzle slices and maybe crop to clipImages or something. Since with some of the effect work we could probably denote something like a clipped image that produces just the keyhole of the image fairly dynamically.

The end point would be that users could slice up a raster with a line tool, or use a puzzle outline show that.

But, for easiest methods of implementing this roughly a better version considering 0.9.x wouldn't be that difficult. Though the original wasn't too bad it was just hard to follow and did things like reached deep into the lihuiyu driver for information that was absolutely coupling those devices.

    """
    A group consists of a list of elements and associated bboxes.

    This method takes a single group and breaks it into separate groups which are not overlapping
    by breaking into a list of groups each containing a single element and then combining them
    when any element in one group overlaps with any element in another.

    returns: list of groups
    """

The easiest thing would be to quickly calculate the overlapping bounding boxes with a numpy triu_indices and dstack. This basically gives you the triangle (upper) matrix of indexes and you find every pair of potential overlaps, and testing these as a unit, taking the four bounds and checking for the locations where:

        checks = np.dstack(
            (
                x0 > y0,
                x1 < y1,
                x2 > y2,
                x3 < y3,
            )
        ).all(axis=2)

is true, this gives us a list of index pairs that overlap in pre-sorted order, such that every first-index will occur before the second-index and we don't need to deal with the bottom triangle. So we get 1,4 but never 4,1, we then iterate over these index pairs. When we encounter a new number not in our dictionary we add a new group to our dictionary (and a list of groups we created), and assign that number and it's companion to the group (and also link the index key -> group). If one of our indexes has a dictionary lookup, we go ahead assign the other one to the group (if needed). This gives us a list of all the overlapping groups. We do need to sort them since we could have something like: 1,3, 3,6, 5,6. -> 1, 3, 6, 5 but that's trivial since the presented order is 1:1 with our indexes.

This would be a few orders of magnitude faster, but I don't think the original code would have taken that long to execute and really the time complexity isn't that different.

Then rather than our RasterOp basically being a big ball of things that get rastered as 1 image, our RasterOp would create n-groups which would raster as their individual images. I don't think solving this with numpy would change much at all. I could probably expose group_by_overlap to other uses but I don't really think it would matter that much. And I probably would allow RasterOp to contain regular Group nodes which would each render independently, and just run group_by_overlap if the optimization is enabled. This would, in theory expose some ability to force a particular rendering manually if you needed that for some reason.

VectorMonotonizer is mostly just Scanbeam now, and there was no really obvious method of using a scanline to solve this faster. You'd basically need to check the actives across the y-axis and actives across the x-axis too. And then find that overlaps occur in those places where there is an overlap of two bounds in the x and in the y. But, there's no obvious methodology for finding the intersections of those two groups. And the alternative is basically just a series of bounding box checks which can just be merged into a single rather trivial check.

The living hinge which we went over with a fine-tooth comb and made really really fast, does absolutely massive numbers of bounding box checks to see for which bounds overlap and thus should be checked intersections. Some fairly dense patterns are still realtime enough to play with them.

tatarize commented 1 year ago

Oh, actually to solve this with a scanbeam you'd identify any places where the x-values overlap where you have any active rectangles and then when another rectangle's start-x is added then we perform overlap check of just those overlapping in the x-space. This reduces our set of potential matches to what should probably be about log(n), but the individual operations needing to be performed here our current brute force method would be done all at one time and that would beat the scanline algorithm even though it technically has a better time complexity.

Sophist-UK commented 1 year ago

But, for easiest methods of implementing this roughly a better version considering 0.9.x wouldn't be that difficult. Though the original wasn't too bad it was just hard to follow and did things like reached deep into the lihuiyu driver for information that was absolutely coupling those devices.

My experience using 0.7 was that calculating the margin by reaching deeply into the driver was unnecessary. As I stated above, you can use a small fixed size margin (or the operation overscan value).

The easiest thing would be to quickly calculate the overlapping bounding boxes with a numpy triu_indices and dstack. ... This would be a few orders of magnitude faster, but I don't think the original code would have taken that long to execute and really the time complexity isn't that different.

This is real code that was live in a real version and had real use so we don't have to guess about the performance. It didn't take long to work in normal code - so I didn't think about Numpyfying it. But I don't have literally thousands of raster elements - and I can see that for a very large number of raster elements it might be noticeable. Up to you whether to Numpify it or not.

Then rather than our RasterOp basically being a big ball of things that get rastered as 1 image, our RasterOp would create n-groups which would raster as their individual images.

Yes - that is what it successfully did in 0.7. And the benefit is not only in speed - it also makes the Group cuts functionality work.

The living hinge which we went over with a fine-tooth comb and made really really fast, does absolutely massive numbers of bounding box checks to see for which bounds overlap and thus should be checked intersections. Some fairly dense patterns are still realtime enough to play with them.

Bounding box checks are simple numeric comparisons since the bbox is pre-calculated once, so it is fast (but because simple, also potentially easily numpifyable).

I would think that the bbox calculations themselves would be a good candidate for numpification (though if these are in svgelements, that would make Numpy a prerequisite for svgelements).

Re: Scanbeam - The majority of use cases will be rasters that are self contained and a (reasonably) long way apart, not Jens edge case example. The existing algorithm therefore will provide 80%+/90%+ of the benefits with 10% of the effort.

jpirnay commented 1 year ago

https://github.com/meerk40t/meerk40t/pull/2240

meerk40t / meerk40t

[0.8.x] Missing raster optimisation functionality #1937