statiqdev / Statiq.Web

Statiq Web is a flexible static site generator written in .NET.
https://statiq.dev/web
Other
1.65k stars 236 forks source link

Regenerating only changed files when using wyam --watch #161

Open martinvobr opened 8 years ago

martinvobr commented 8 years ago

It looks like every time when a single file is changed the whole output directory is cleared and everything is regenerated. It's ok for small sites, but when editing web with tens and hundreds documents it can take quite long between doc is saved and before we can verify the result.

Would it be possible to regenerate only changed documents? In other words it would work as 'build' and not as 'rebuild all'.

How it would be possible? Few thoughts (probably naive as I don't know internals of wyam):

daveaglick commented 8 years ago

This is actually really hard to do because of the way modules pass data on from one to the next. The modules that end up outputting stuff to disk (or uploading, or whatever the "end result" is) are operating on some set of input documents and we probably don't know the underlying source for them all. When one file changes, how do we know which modules need to be re-run to generate the desired output? And what if that one input file actually gets used in a later pipeline (for example, in the Wyam website I read all the Wyam source files to auto-generate the code documentation but then reuse those documents later for generating the top-level indexes).

Your example of the Razor partial is a great look at why this is hard. The partial is specified by the Razor template and not directly in the module. Therefore the engine really doesn't have any idea which partials are included in which Razor pages. Even if we added code to the behind-the-scenes Razor generation code to track this, it's unclear if it would be enough. And then we'd have to do the same for Less CSS includes, etc.

This is one of the tradeoffs that I decided on early in the project in exchange for being highly generic. Many other static generators can be smarter about rebuilding because their scope is so narrow. That said, the current situation can always be improved. For one, caching was recently added and can still be beefed up. We still rebuild everything, but when a Razor page, for example, is encountered if the content is the same as the last time we saw it we don't have to rebuild it, only execute it. There's also probably some stuff that could be done by having modules track the flow of inputs through the pipeline (see #41).

martinvobr commented 8 years ago

Hmm, I was afraid of something like this. Would it be possible for example to mark specific modules or specific pipelines as safe for not-need-to-be-regenerated? Wyam cannot easy know which are safe, but document developer could.

For example I have module which only copies png files from input to output folder. Or another custom module which gets some data from external system. I know for sure, that such data are not needed to be regenerated when some set of input folders are changed.

Maybe instead of watching the whole input folder for every pipeline lets have an (optional) set of sources which should be watched for every pipeline. Something similar to filtered source control blocks on CCNet build server. Syntax would be different, but spirit could be similar. It would also enable ability to watch other locations as in #41.

Adding part of ccnet build config as an example

<sourcecontrol type="filtered">
            <!-- watch only changes in some parts of source code -->
            <inclusionFilters>
                <!-- src -->
                <pathFilter><pattern>Src/Common*/**</pattern></pathFilter>
                <pathFilter><pattern>Src/Test*/**</pattern></pathFilter>
                <!-- data -->
                <pathFilter><pattern>Data/Common*/**</pattern></pathFilter>
            </inclusionFilters>
            <exclusionFilters>
            </exclusionFilters>
   </sourcecontrol>
daveaglick commented 8 years ago

Good idea - being able to overwrite the default watching behavior with specific paths (especially w/ globbing support) would be really handy. I could even see making this a new type of class that can be configured from the config file (or directly in the engine):

Watchers
    .Clear()  // Remove the default watcher that watches the InputPath folder
    .Add(
        FileSystem("Src/Common*/**"),
        DatabaseTrigger("..."),
        // etc
    );

The relationship with #41 would need to be defined. Perhaps modules would also add watchers to the collection unless told not to. Or maybe the capability to define explicit watchers would supersede #41 and make it redundant. When creating a watcher maybe you can optionally specify which pipelines it triggers. Lots to think about here, but I think this might be onto something...

martinvobr commented 8 years ago

I like the Watchers config section. When specific set of watchers can be assigned to the pipeline it would be good enough for my needs.

If there is minimalistic version with:

I could try to create and contribute a more advanced filesystem watcher for example.