sergey-tihon / Clippit

Fresh PowerTools for OpenXml
https://sergey-tihon.github.io/Clippit/
MIT License
50 stars 19 forks source link

perf: improved PublishSlides perf #61

Closed f1nzer closed 1 year ago

f1nzer commented 1 year ago

After conducting some research, I have found that it is possible to further improve the performance of PublishSlides.

The following numbers are specific to the net6 branch. While I have not tested this code on the main branch, I believe it should work similarly (but should be tested anyway).

Part 0. Initial perf values

I've tried to run PublishSlides method on one of my local pptx files (with many svg items inside).

Initial values are:

Part 1. FluentPresentationBuilder

I've changed these lines:

var slideDocument = slide.GetXDocument();
slide.RemoveAnnotations<XDocument>();
...
newSlide.PutXDocument(slideDocument);

to these

using (var sourceStream = slide.GetStream())
using (var targetStream = newSlide.GetStream(FileMode.Create, FileAccess.Write))
{
    sourceStream.CopyTo(targetStream);
}
var slideDocument = newSlide.GetXDocument();

There is no need to call PutXDocument method because we can directly pipe the source stream to the target stream (without unnecessary XmlWriter serializer work on PutXDocument side).

New results after these changes:

Part 2. GetSlideTitle

AppendSlides method uses XDocument generated from the new slide everywhere, but GetSlideTitle uses only OpenXML methods. As a result, the slide is read/deserialized multiple times (one time for XDocument, and another for OpenXML).

The solution was to use the same XDocument-based approach for title extraction, because XDocument is stored (cached) in SlidePart as a feature and there is no need to do anything addtional from IO/CPU side.

New results after this change:

sergey-tihon commented 1 year ago

Thank you for this contribution!

Release as 2.0.0-theta-001, but be careful, after the update transitive dependency SixLabors.ImageSharp should be 2.1.3 (or <3.0.0)