After conducting some research, I have found that it is possible to further improve the performance of PublishSlides.
The following numbers are specific to the net6 branch. While I have not tested this code on the main branch, I believe it should work similarly (but should be tested anyway).
Part 0. Initial perf values
I've tried to run PublishSlides method on one of my local pptx files (with many svg items inside).
Initial values are:
Total time: 130s
GC Time: 43.869s
Peak memory: 480 MB
Allocated: ~7.05 GB
Part 1. FluentPresentationBuilder
I've changed these lines:
var slideDocument = slide.GetXDocument();
slide.RemoveAnnotations<XDocument>();
...
newSlide.PutXDocument(slideDocument);
to these
using (var sourceStream = slide.GetStream())
using (var targetStream = newSlide.GetStream(FileMode.Create, FileAccess.Write))
{
sourceStream.CopyTo(targetStream);
}
var slideDocument = newSlide.GetXDocument();
There is no need to call PutXDocument method because we can directly pipe the source stream to the target stream (without unnecessary XmlWriter serializer work on PutXDocument side).
New results after these changes:
Total time: 112s
GC Time: 38.329s
Peak memory: 365 MB
Allocated: ~6.74 GB
Part 2. GetSlideTitle
AppendSlides method uses XDocument generated from the new slide everywhere, but GetSlideTitle uses only OpenXML methods. As a result, the slide is read/deserialized multiple times (one time for XDocument, and another for OpenXML).
The solution was to use the same XDocument-based approach for title extraction, because XDocument is stored (cached) in SlidePart as a feature and there is no need to do anything addtional from IO/CPU side.
After conducting some research, I have found that it is possible to further improve the performance of PublishSlides.
The following numbers are specific to the net6 branch. While I have not tested this code on the main branch, I believe it should work similarly (but should be tested anyway).
Part 0. Initial perf values
I've tried to run PublishSlides method on one of my local pptx files (with many svg items inside).
Initial values are:
Part 1. FluentPresentationBuilder
I've changed these lines:
to these
There is no need to call
PutXDocument
method because we can directly pipe the source stream to the target stream (without unnecessary XmlWriter serializer work on PutXDocument side).New results after these changes:
Part 2. GetSlideTitle
AppendSlides
method usesXDocument
generated from the new slide everywhere, butGetSlideTitle
uses only OpenXML methods. As a result, the slide is read/deserialized multiple times (one time for XDocument, and another for OpenXML).The solution was to use the same XDocument-based approach for title extraction, because XDocument is stored (cached) in SlidePart as a feature and there is no need to do anything addtional from IO/CPU side.
New results after this change: