statiqdev / Statiq.Framework

A flexible and extensible static content generation framework for .NET.
https://statiq.dev/framework
MIT License
421 stars 74 forks source link

DocumentFileProvider Performance #228

Open phil-scott-78 opened 2 years ago

phil-scott-78 commented 2 years ago

I was taking a look at perf and trying to speed things up a bit for our docs. When profiling I noticed a significant amount of time, if not the majority of the run, was spent calculating a document tree repeatedly

Over 900s of the run time (thankfully in parallel) was spent building up this provider.

image

From my experiments, it seems most of the calls is due from filtering in ExecutionContext.cs.

_outputPages = new Lazy<FilteredDocumentList<IDocument>>(
    () => new FilteredDocumentList<IDocument>(
        Outputs
            .Where(x => !x.Destination.IsNullOrEmpty
                && Settings.GetPageFileExtensions().Any(e => x.Destination.Extension.Equals(e, NormalizedPath.DefaultComparisonType))),
        x => x.Destination,
        (docs, patterns) => docs.FilterDestinations(patterns)),
    LazyThreadSafetyMode.ExecutionAndPublication);

If I change this to create a file provider once and then create a handful of overloads to pass it down into this

_outputPages = new Lazy<FilteredDocumentList<IDocument>>(
    () =>
    {
        DocumentFileProvider fileProvider = new DocumentFileProvider(Outputs, false);
        return new FilteredDocumentList<IDocument>(
            Outputs
                .Where(x => !x.Destination.IsNullOrEmpty
                            && Settings.GetPageFileExtensions().Any(e =>
                                x.Destination.Extension.Equals(e, NormalizedPath.DefaultComparisonType))),
            x => x.Destination,
            (docs, patterns) => docs.FilterDestinations(fileProvider, patterns));
    },
    LazyThreadSafetyMode.ExecutionAndPublication);

image

This also cuts down the number of NormalizedPath instances created down from 4 billion to only 400 million.

Everything still looks the same, but I'll be the first to admit that I have no idea what I'm doing. In fact, there is a good chance that I'm triggering some edge case with my doc configuration to begin with. But it feels like this is about n * n instances being built up, and when you point the doc builder at a larger instance that starts to add up quickly with a document for every field, property, method, etc.

phil-scott-78 commented 2 years ago

Did some more thinking about this. the n * n should have been a clue about what was happening. Sidebar being built for each page was eating up a ton of resources.

I ended up creating a static instance of the sidebar data which helped a ton. I'm also experimenting on something like OutputPages.Cached() helper that wraps up a static instance for my razor pages too.

Here's the commit with the experiments - https://github.com/phil-scott-78/spectre.console/commit/a50c1eb4214f04f8913eeb9a07b378f3dd76f357

daveaglick commented 2 years ago

Yeah, I've run into this too - it's why I added the (as yet undocumented) https://github.com/statiqdev/Statiq.Framework/issues/205. I've also tried to cache as much as I can under the hood, but there are just a lot of places we don't have enough context for caching. That said, I'm sure there is a lot of work that could be done here.

Every time I get into a performance hunting mood I'm astonished at the number of paths and strings being created and passed around. Unfortunately my ability so far has been mostly "get it working fast enough to keep writing this other feature I want to work on" and not "dedicate some serious time and brain cells to speeding things up".

phil-scott-78 commented 2 years ago

Oh cool, I'll see if that helps. I think I mostly need the data cached which is why I'm kind of digging my extension method because I'm using it also to highlight the current item, building the breadcrumbs, etc.

Been fun messing with the levers. Just realized that because I'm using the static that they'll persist across live reloads so back to the cache factory and try and find a bit more clever of a solution.