gumyr / build123d

A python CAD programming library
Apache License 2.0
521 stars 89 forks source link

sort_by using Axis can be extremely slow #275

Closed voneiden closed 1 year ago

voneiden commented 1 year ago

The problem

sort_by using Axis does a transformation of every shape that is to be sorted: https://github.com/gumyr/build123d/blob/66fe3b5e14edff8c8ce8f6ce0334ec6db9f1fe3d/src/build123d/topology.py#L3037-L3042

This transformation in turn performs a deep copy of the transformed object: https://github.com/gumyr/build123d/blob/66fe3b5e14edff8c8ce8f6ce0334ec6db9f1fe3d/src/build123d/topology.py#L2170-L2174

This deep copy is completely unnecessary overhead for the purposes of sorting, but normally it's not a significant overhead. However, there appears to be case(s?) where things get out of hand, such as sorting faces of a Compound of Solids.

MRE and demonstration of slowness

import time

from build123d import *

def gen_boxes(face_count):
    box_count = int(face_count / 6)
    boxes = [Box(1, 1, 1, mode=Mode.PRIVATE) for _ in range(box_count)]
    return Compound(children=boxes)

c = gen_boxes(10)
t = time.time()
print("Starting sort..")
c.faces().sort_by()
print("Sorting done, took", time.time() - t, "seconds")

This code snippet can be used to test performance. Table shows results with current dev

Face count Time taken by faces().sort_by()
~10 (=6) 0.0029 s
~100 (=96) 0.476 s
~1000 (=996) 77 s

That gets out of hand fast and this is with simple geometry.. in my case I got about 800 faces but it takes over 3 minutes to sort them!

The cause

Fundamentally the slowness comes from the __deepcopy__. With my brep this issue gets highlighted because each face has the topo_parent set to the parent compound, so for each face the whole compound is (deeply) copied.

Solutions

Copying anything seems to be unnecessary for the purpose of sorting as we're only interested in the Z value of the transformed shape and once we have it everything's going to be garbage collected anyway. Adding an extra kwarg to transform_shape such as skip_copy=False and setting it to True when sorting by Axis the MRE test results becomes way more reasonable:

Face count Time taken by faces().sort_by() Improvement
~10 (=6) 0.00071 s 4x
~100 (=96) 0.0084 s 56x
~1000 (=996) 0.0784 s 982x

I will open a PR with this solution, but of course feel free to close it if there's a better/cleaner way to resolve this.

voneiden commented 1 year ago

Actually no, I took a bit of a shortcut with my test solution. to_local_coords and the gang would also need to support skip_copy. That's getting slightly ugly if I litter it here and there. Any thoughts?

gumyr commented 1 year ago

Here is the result of a cProfile on the above sort:

         201610 function calls (175498 primitive calls) in 0.575 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1728/96    0.496    0.000    0.558    0.006 topology.py:2118(__deepcopy__)
     2016    0.015    0.000    0.017    0.000 topology.py:7695(downcast)
 23040/96    0.014    0.000    0.558    0.006 copy.py:128(deepcopy)
     1536    0.006    0.000    0.007    0.000 geometry.py:971(__init__)
     3360    0.004    0.000    0.004    0.000 {method 'items' of 'dict' objects}
  1632/96    0.004    0.000    0.318    0.003 copy.py:200(_deepcopy_list)
     6528    0.004    0.000    0.005    0.000 copy.py:242(_keep_alive)
    43008    0.003    0.000    0.003    0.000 {method 'get' of 'dict' objects}
      864    0.003    0.000    0.003    0.000 geometry.py:129(__init__)
       96    0.002    0.000    0.002    0.000 {built-in method OCP.BRepGProp.SurfaceProperties_s}
    21408    0.002    0.000    0.002    0.000 {built-in method builtins.setattr}
    36384    0.002    0.000    0.002    0.000 {built-in method builtins.id}
     2208    0.002    0.000    0.002    0.000 topology.py:7748(shapetype)
     1536    0.002    0.000    0.008    0.000 geometry.py:1048(__deepcopy__)
       96    0.001    0.000    0.008    0.000 geometry.py:1488(__init__)
       96    0.001    0.000    0.001    0.000 topology.py:1754(geom_type)
     1632    0.001    0.000    0.005    0.000 copy.py:226(_deepcopy_dict)
    10849    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
       96    0.001    0.000    0.002    0.000 geometry.py:1751(_calc_transforms)
       96    0.001    0.000    0.561    0.006 topology.py:2155(transform_shape)
     1632    0.001    0.000    0.001    0.000 {built-in method OCP.TopoDS.Compound_s}
    13248    0.001    0.000    0.001    0.000 copy.py:182(_deepcopy_atomic)
     7968    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
       96    0.001    0.000    0.002    0.000 topology.py:1532(cast)
      288    0.001    0.000    0.001    0.000 geometry.py:390(to_dir)
       96    0.001    0.000    0.574    0.006 topology.py:3123(<lambda>)
       96    0.000    0.000    0.005    0.000 topology.py:4520(center)
     5664    0.000    0.000    0.000    0.000 {built-in method builtins.len}
     3264    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
      288    0.000    0.000    0.001    0.000 geometry.py:275(normalized)
      192    0.000    0.000    0.001    0.000 topology.py:1205(__init__)
     3264    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
      192    0.000    0.000    0.000    0.000 geometry.py:386(to_pnt)
      384    0.000    0.000    0.000    0.000 {built-in method OCP.TopoDS.Face_s}
     1728    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x7372a0}
      384    0.000    0.000    0.000    0.000 geometry.py:197(Z)
      192    0.000    0.000    0.000    0.000 geometry.py:1234(__init__)
       96    0.000    0.000    0.008    0.000 geometry.py:504(to_plane)
      288    0.000    0.000    0.001    0.000 geometry.py:212(to_tuple)
       96    0.000    0.000    0.000    0.000 geometry.py:221(cross)
       96    0.000    0.000    0.561    0.006 geometry.py:1796(_to_from_local_coords)
     2304    0.000    0.000    0.000    0.000 geometry.py:207(wrapped)
      192    0.000    0.000    0.000    0.000 nodemixin.py:123(parent)
      288    0.000    0.000    0.000    0.000 geometry.py:177(X)
       96    0.000    0.000    0.002    0.000 geometry.py:1673(origin)
        1    0.000    0.000    0.000    0.000 topology.py:1953(_entities)
      288    0.000    0.000    0.000    0.000 geometry.py:187(Y)
       96    0.000    0.000    0.000    0.000 geometry.py:216(length)
       96    0.000    0.000    0.561    0.006 geometry.py:1837(to_local_coords)
        1    0.000    0.000    0.575    0.575 {built-in method builtins.exec}
      288    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.574    0.574 {built-in method builtins.sorted}
        1    0.000    0.000    0.575    0.575 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 topology.py:2026(<listcomp>)
       96    0.000    0.000    0.000    0.000 geometry.py:1668(origin)
       96    0.000    0.000    0.000    0.000 typing.py:1375(cast)
        1    0.000    0.000    0.001    0.001 topology.py:2024(faces)
        1    0.000    0.000    0.574    0.574 topology.py:3105(sort_by)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}

So yes, deepcopy is taking the bulk of the time. I haven't looked at this in detail but I will, it seems like there should be something that can be done to improvement performance. Thanks for raising the issue.

gumyr commented 1 year ago

Using transform_shape is functionally correct but probably not the best from a performance point of view. This should be replaced with a moved or located operation which doesn't change the object just the location attribute of the object. Need to work up a prototype of this...

gumyr commented 1 year ago

I've changes how sort_by works so that the shape need not be moved or transformed at all (see 2f0a43c00e9cbebb65a1f3e7be97d6522b042b2b). On my machine sorting the faces on 1000 boxes goes from 60.10s to 0.0438s (more than 1300 times faster). Hopefully I didn't mess anything up!

If everything is okay with this, I'll apply the same change to group_by.

gumyr commented 1 year ago

Applied the same optimization to group_by: 119.66301345825195 s vs. 0.08370232582092285 s. See e933edf059d5e0c90713e0bc2ebc3770d64c77a7

voneiden commented 1 year ago

Awesome, thanks!