jbake-org / jbake

Java based open source static site/blog generator for developers & designers.
http://jbake.org
MIT License
1.11k stars 327 forks source link

[Feauture request] Improve baking speed #391

Open shalugin opened 7 years ago

shalugin commented 7 years ago

I want something like incremental site generation. I've enabled persistent content store (http://jbake.org/docs/2.5.1/#persistent_content_store). But generation can be faster.

For example we can do following things:

jonbullock commented 7 years ago

Thanks for raising this @shalugin 👍

jmcgarr commented 5 years ago

I have also been looking into this feature, and I see a few ways to make this work. @jonbullock do you have an approach/design in mind already or do you mind if I fork and take a stab at it? I can see points of leverage being the CustomFSChangeListener, which currently does a bulk Oven.bake() rather than incrementally rerendering the content that changed.

Also noticed that the bake() performance could improve by ensuring assets have already been copied and only copying those that have changed.

ancho commented 5 years ago

Hello @jmcgarr, I see three phases where we could optimize performance.

Adding some sort of a cache to the asset management is one idea, to copy only files that have changed or remove files that have been deleted, as you said.

The rendering phase could be parallelized. It's working sequential at present. Renderer by renderer, documents by documents.

Overall it would be nice to extend the oven to bake single documents, too. Like heating a slice instead of the whole pizza.

At least from my side, feel free to go ahead. Any help is appreciated.

Maybe you can work in small pull requests for different perfomance optimizations you have in mind and share your work as a WIP pull request.

That way we can discuss design decissions and breaking changes early and it's easier to review.

jmcgarr commented 5 years ago

Sounds good @ancho! I agree with your three phases. I did more digging in yesterday and found just as many opportunities for improving performance in all three phases. I also like your proposal of making small pull requests to address the increased performance over time.

In taking a glance at tracking/caching asset management information, I was questioning the approach of OrientDB. In short, my suspicion is that spinning up the in-memory db and querying it incurs some performance penalty. My gut it to serialize/deserialize a Java object to a cache directory that can hold the state in memory. Thoughts on this approach?

I also noticed this issue was slated for jbake 3.0. Not sure if this changes the approach for accepting PR's?

ancho commented 5 years ago

In taking a glance at tracking/caching asset management information, I was questioning the approach of OrientDB. In short, my suspicion is that spinning up the in-memory db and querying it incurs some performance penalty. My gut it to serialize/deserialize a Java object to a cache directory that can hold the state in memory. Thoughts on this approach?

Hmm...I think that's because the Oven's bake method closes the database and shuts down the orient db engine at the end.

That's only a problem if running in serve mode. So it's basically deciding not to close and shutdown the database and let the caller of jbake handle that.

I have no interest in reinventing the wheel. If it's not fast enough if we don't shutdown and spin on the database we should consider to change the database.

Also...there is a newer version of orientdb available 3.0.8 which is not compatible with jdk 7. So if we close #549 (pull-request in the making) we can see what we get out of it.

I also noticed this issue was slated for jbake 3.0. Not sure if this changes the approach for accepting PR's?

I think we can change that. As long as the changes do not introduce api changes that could break things we shouldn't tie us to a specific release version for performance optimization or any kind of improvements. But we should always think about other libraries that may use jbake already. I think that's why @jonbullock chose semantic versioning.

A milestone on an issue is more a proposal for an orientation for me.

jonbullock commented 5 years ago

Sorry for not commenting on this before now.

I'm more than happy for you to have a go at improving the speed @jmcgarr, all the ideas mentioned here (individual file baking, processing only assets that have changed etc.) sound absolutely great to me.

@ancho is spot on about semver and the milestone, I try to give each issue/pr an initial milestone but that's not set in stone, if we can make small backwards compatible speed improvements then I'm happy to get them out as soon as possible.

I'll comment on the mailing list too shortly.