Performance Optimization: Caching/Memoization

karl-nilsson commented 3 years ago

By hashing slices, I can recognize duplicates, thereby reducing work. No need to calculate shells/infill if the inputs (wires, slicer settings) are the same. This can be implemented with std::unordered_map<sse::Slice, std::pair<size_t, size_t>>, with the value being pointers to the start and end of gcode within the aggregate gcode string.

This also allows for more advanced gcode, specifically M97 subprogram.

karl-nilsson commented 2 years ago

related: #19

Idea: hash/cache individual components (infill patterns, offset loopsets), and use composite keys for full slices. Basically, memoization.

Either use separate caches for each component type, or a unified cache and type check on cache hit. std::unorded_map<std::variant, std::variant> https://www.techiedelight.com/use-struct-key-std-unordered_map-cpp/

component: key (hash) -> value infill: infill pattern + density + layer # -> cavc::Polyline offset: cavc::loopset + offset + count -> cavc::OffsetLoopSet slice: offset + infill + layer height -> trimmed polylines (infill), or std::string (gcode)

karl-nilsson commented 2 years ago

Also, need to benchmark caching to determine optimal tradeoffs (e.g. hashing a pline + offset may be more expensive than doing the calculation)

Finally, this solution needs to be thread-safe.

karl-nilsson commented 2 years ago

Can also cache results of wire->polyline conversion, and curve flattening

karl-nilsson commented 2 years ago

Ideally, the inputs/outputs would be position-independent, but that's not possible for all types of inputs. For example, a tilted cylinder would slice into a series of ellipses. With a naïve hash (which simply hashes the radii and coordinate system), each ellipse would hash differently, rendering the cache useless. A location-compensating hash could translate the shape to the origin before the hash operation, which would cause a cache hit for each slice, avoiding the flattening/offsetting operations. I doubt this would work for trimming infill plines, because that result is dependent on z position and xy position.

Of the 4 basic transformations (translate, rotate, mirror, scale), translate seems the easiest to implement: simply translate all points of interest (e.g. control point) such that the first point of interest is at the origin, before running the hash function.

Rotation may be achievable for entities which have their own coordinate system (e.g. ellipse).

Need to benchmark the different approaches.

karl-nilsson commented 2 years ago

Another idea: execution graph

Original design:

run each operation in a separate thread
calculate hash
cache hit: fetch result
cache miss: calculate result, and append to cache
upside: very simple implementation
downside: either allow cache to balloon in size, or implement LRU logic

New design:

calculate all hashes (easily done in parallel)
build an execution graph (tree, possibly DAG)
traverse graph
upside: bounded cache size
downside: unknown (yet bounded) cost of tree construction/traversal
Can do a multi-dimensional graph, with each dimension being a different type of operation (e.g. infill clip op depends on an infill generation op and an offset op (which depends on a flattening op))

karl-nilsson commented 2 years ago

This idea depends on the stability of OCCT's intersection operation. If the resulting faces (specifically wires) aren't identical, I'll have to use a fuzzy hash for matching. This is the first task necessary to assess the viability of this feature

The best-case scenario for this type of optimization is a prism with the Z axis as the center axis and a complex cross-section. This is likely with mechanical parts made with traditional CAD (and imported via STEP/IGES/brep). Also, tower support structures will work great with this strategy.

The worst-case scenario is organic (curved) surfaces and mesh models, as the chances of any two faces/wires/segments being identical is very low. Even if I can get the overhead of the cache very low, it may make sense to ignore it entirely for mesh files.

karl-nilsson commented 2 years ago

Can use abseil's hash functions (overload user classes) https://abseil.io/docs/cpp/guides/hash

karl-nilsson commented 2 years ago

n.b. #31 will complicate this a bit. Right now, the order of operations looks like this: Slice intersect -> wire to pline (curve flattening here) -> pline simplification -> pline offsetting -> infill trimming While it may be feasible to cache the pline simplification step as well, it's probably better to combine it with the wire to pline operation.

karl-nilsson commented 2 years ago

Relevant occt docs: https://dev.opencascade.org/doc/refman/html/class_topo_d_s___shape.html#aa01a265c68bdb9a35a762048712b19d7 https://dev.opencascade.org/doc/refman/html/class_topo_d_s___shape.html#ab997feaf56cedf1b034d8f19cb6e59c1

karl-nilsson commented 2 years ago

Unfortunately, it appears the easy route isn't possible. Specifically, BRepAlgoAPI_Common produces different TShapes, so the built-in comparisons (HashCode, IsPartner, IsSame, IsEqual) always return false.

A few possibilities remain:

project face/wire onto XY plane, then use result for hash/comparison
- benchmark overhead
manually hash the face/wire, discarding the z-value in the process
- wire: convert to a list of Geom2D_Curves
don't hash anything early in the pipeline (anything occt related), only polylines and afterwards.

karl-nilsson / sse

Performance Optimization: Caching/Memoization #18