i.e. can this tool efficiently utilize a cluster of machines (communicating over a network via something like MPI) when checking a "large" geometry. @makeclean I heard you're thinking about this

I should note that a number of existing choices have been made assuming this was a direction to be taken, and it's worth describing what these are:

step_to_brep

loading a STEP file with OCCT, given the current implementation, is relatively slow and all-or-nothing, while BREP files are much faster. e.g. a 200MiB STEP file takes ~2 minutes to open, writing the same geometry (~5000 solids) out as BREP takes ~7 seconds, and loading that BREP file ~3 seconds.

this step already flattens the shapes into a linear list because this structure is easier to schedule operations on than the hierarchy used by STEP. additionally, it might be worth splitting up a large input file into multiple flattened BREP files, either a range (e.g. solids 1000 to 2000) of a large STEP file, which would hopefully allow larger files to be checked by smaller-memory nodes

overlap_checker

this is currently the most computationally intensive step and (conveniently) the most amenable to parallelism. there are two main operations performed here:

computing an acceleration structure, indicating which shapes can possibly overlap (currently using orientated bounding boxes)
checking every pair of shapes that can overlap to see whether they do, and quantifying any intersections

both of these steps are currently parallelized over threads, but it's possible to also split this across nodes. given the speed of loading BREP files, it might even be worth performing this separately multiple times in a NUMA system to ensure that each socket has fast/local access to the geometry

imprint_solids and merge_solids

currently not parallelized, mostly because this functionality isn't heavily used yet and partially due to API limitations in OCCT

tagging @makeclean because john mentioned that you'd be interested in this

I should also mention that I've looked at the acceleration structure to see which objects potentially overlap to see whether it's worth doing something else (other than relying on the file system) to distribute the work of the checker.

the following image shows the linearised list of shapes (from iter_clite) showing which solids potentially intersect.

pairwise obb overlaps in original order

black dots indicate the pair had intersecting bounding boxes. assuming this file is representative, it seems as though most nodes would need access to a significant proportion of shapes to get their work done.

in an ideal world it might be possible to reorder the shapes into some optimal ordering to minimise the number of files loaded. the following image shows the above solids reordered, via hierarchical clustering:

reordered to increase diagonalisation

but it's unclear to me how spending the time to calculate this would be a win

ukaea / overlap_checker

support for running on a distributed memory system #26

step_to_brep

overlap_checker

imprint_solids and merge_solids