Open ConnorKrammer opened 7 years ago
@ConnorKrammer The code for .add_paragraph()
does indeed iterate over the elements as it uses a general-purpose facility for adding XML elements in the order prescribed by the XML Schema.
This could be overridden with a more clever method on docx.oxml.document.CT_Body
having the name .add_p()
.
Part of the challenge is that the new <w:p>
element can't simply be appended, it must appear before an optional w:sectPr
element at the end of the w:body
element. But I expect some clever XPath work could improve the timings markedly. I'll be willing to consider a pull request to that end.
Recently, I used python-docx to combine several hundred blog posts into a single Word document for editing. While doing this, I was surprised to note that with every paragraph added using
Document.add_paragraph
, the time required to add the next paragraph increased.Fortunately, python-docx offers an alternate method of adding paragraphs:
Paragraph.insert_paragraph_before
. I used this to compare performance, and indeed, the difference is huge. I suspect that somewhere in the code foradd_paragraph
, every single paragraph is being needlessly iterated over—this has the whiff of Shlemiel the painter’s algorithm at work.Here's a test program I wrote in Python 3.4.2 and the resulting performance numbers:
Output: