Possibility to add points as a batch to speed up execution time

AlexandreSi commented 4 years ago

Hi there,

Thank you for your work on ezdxf, it is a highly reliable resource to me. I am using it to write dxf files from LiDAR point clouds, and it works very well!

I sometimes have to work with big point clouds (around 500k points) and it takes time to write the DXF file. I was looking into your code but couldn't find an obvious way to speed up things. I was wondering if you had already thought about it (around the database, or if multiprocessing is worth considering), and if you have a suggestion, I would happy to open a PR for it!

Thank you, Alexandre

mozman commented 4 years ago

Look at https://ezdxf.mozman.at/docs/addons/r12writer.html# this addon writes only DXF R12 but fast.

AlexandreSi commented 4 years ago

Thank you for the tip, It divided the time needed by 10!

In the future, I will need to add linear dimensions. In the docs, you mention that R12 format doesn't support dimensions (I tried to reopen the DXF and add the dimension but It took as much time as without the r12writer and the dimension didn't show).

Do you see a way to efficiently add a lot of points and add a few linear dimensions?

Thank you again, Alexandre

mozman commented 4 years ago

The r12writer is that fast because no in memory structures are created, this is not possible for dimensions, because dimensions require blocks with the graphical representation as lines and text and arrows which is not possible on the fly, because the blocks section has to be written in front of the entities section.

mozman commented 4 years ago

There is an AutoCAD incompatible way, "friendly" CAD applications like BricsCAD open DXF files with dimensions without associated blocks, but this is not implemented yet. Adding this feature would not be a big effort, but this dimensions could only use basic features and also would look very ugly, because can use only Standard dimstyle, and as said AutoCAD can not open this files.

mozman commented 4 years ago

Even AutoCAD would open this R12 files without blocks, but ugly is still true:

Edit: But only if fixed_tables=False, without predefined linetypes and text styles

AlexandreSi commented 4 years ago

Thanks for the precisions. My users use AutoCAD so I won't go on the R12 files. On the "classic format", have you already analyzed what would be the bottleneck? to know if we can speed up this step?

mozman commented 4 years ago

No, there is nothing I can do for you.

You can try pypy3 which speed up things for long running tasks or you have to look for a solution in in the C-family (libredwg, libdxfrw, netDXF).

mozman commented 4 years ago

Attach points as XREF:

from pathlib import Path
import ezdxf
from ezdxf.addons import r12writer
import random
from ezdxf.tools.standards import setup_dimstyle

DIR = Path('~/Desktop/Outbox').expanduser()
XSIZE, YSIZE = (100, 100)
COUNT = 1000
POINTS = [(random.random() * XSIZE, random.random() * YSIZE) for _ in range(COUNT)]
XREF = 'points.dxf'
# scale 1:1000; text size H = 2.5mm on paper; 1 drawing unit = 1m
DIMSTYLE = 'USR_M_1000_H25_M'

with r12writer(str(DIR / XREF)) as r12:
    for point in POINTS:
        r12.add_point(point, layer='Points')

doc = ezdxf.new('R2000', setup=True)
doc.add_xref_def(filename='.\\' + XREF, name='points_xref')

msp = doc.modelspace()
msp.add_blockref(name='points_xref', insert=(0, 0))
setup_dimstyle(doc, fmt=DIMSTYLE, style=ezdxf.options.default_dimension_text_style)
msp.add_aligned_dim(p1=(0, 0), p2=(0, 100), distance=5, dimstyle=DIMSTYLE)
msp.add_aligned_dim(p1=(0, 100), p2=(100, 100), distance=5, dimstyle=DIMSTYLE)
doc.set_modelspace_vport(height=YSIZE * 1.25, center=(XSIZE / 2, YSIZE / 2))
doc.saveas(DIR / 'host.dxf')

This works with BricsCAD:

Does not work with TrueView (does not show DXF xrefs at all), but maybe with the full AutoCAD version.

AlexandreSi commented 4 years ago

Thanks for your code samples! I was looking into the EntityDB class and I saw this piece of code:

    def next_handle(self) -> str:
        """ Returns next unique handle."""
        while True:
            handle = self.handles.next()
            if handle not in self._database:  # you can not trust $HANDSEED value
                return handle

It seems to me that the check in the database keys could be expensive given the number of points I am handling. Do you have info about this HANDSEED value and why we can't trust it?

mozman commented 4 years ago

No this is not a bottleneck, this is a regular Python dict() lookup.

mozman commented 4 years ago

Is the POINT entity the most used entity? Can you provide a striped down version of your DXF creating process with random data?Because I am not sure adding the entity is the (only) bottleneck, I guess exporting the DXF file is also too slow for your application, so code profiling is very important.

mozman commented 4 years ago

As I expected, example for 200.000 3D POINTS:

Time to add points: 6.51s
Time to export DXF: 13.97s
Overall runtime: 20.50s

Faster export routine for POINT:

Time to add points: 6.52s
Time to export DXF: 10.50s
Overall runtime: 17.03s

This optimization does not improve the overall time very much and adds additional complexity to the DXF export process. Sad but true there is not much I can do for you yet, I plan to add Cython support in the future but not in the near future - after 1.0 release.

Slow (regular) export routine but using pypy3:

Time to add points: 1.71s
Time to export DXF: 3.10s
Overall runtime: 4.82s

This is an effortless optimization and you can try to run pypy3 as a subprocess for DXF creation, if pypy3 does not support all python libs you need.

AlexandreSi commented 4 years ago

Hi @mozman,

I conducted the same analysis a few days ago but I couldn't find the time to present the results. I got the same conclusion : writing the DXF accounts for 60-70% of the duration, and adding the point, for the rest.

Thank you for the results with pypy3, I can see the benefits are not negligeable at all.

I will maybe try to use it to check compatibility with my other dependencies (flask, numpy, scipy and others)

Thank you!

mozman / ezdxf

Possibility to add points as a batch to speed up execution time #135