prismicio / prismic-gatsby

Gatsby plugins for building websites using Prismic
https://prismic.io/docs/technologies/gatsby
Apache License 2.0
313 stars 97 forks source link

Support incremental builds #215

Closed angeloashmore closed 4 years ago

angeloashmore commented 4 years ago

The following is based on my initial understanding of the Gatsby cache/node system and how it relates to incremental builds.

Related links

Related Gatsby blog post https://www.gatsbyjs.org/blog/2020-04-22-announcing-incremental-builds/

Using incremental builds without Gatsby Cloud https://www.gatsbyjs.org/docs/page-build-optimizations-for-incremental-data-changes/

Initial thoughts

We currently fetch all documents and create nodes for them on each gatsby develop and gatsby build. This means the timestamp/fingerprint associated with each node resets on each fetch. Because the timestamp is fresh, Gatsby treats it like new data, causing a full build.

Prismic's API does not have a diff/delta API, but it does have ref and predicate-based querying. ref allows us to fetch the repo from a specific time (defaults to current/master) while predicates would allow us to query just for documents updated past a certain timestamp.

Here's a flow we could try:

  1. Fetch all documents using the master ref as we currently do.
  2. Get all cached Gatsby nodes using getNodes() and compare to the fetched documents to determine which documents were deleted (i.e. missing from the fetched documents) or updated (i.e. last_publication_date is greater than the most recent fetch).
  3. For each deleted document, call deleteNode.
  4. For each new or updated document, call createNode.
  5. For all other documents (i.e. not updated or deleted, existed before), call touchNode.
  6. Set the current timestamp in the cache as the most recent time nodes were compared. This is the timestamp used in step 2.

When the cache is non-existant (i.e. first build/fetch, cleared cache), all documents will naturally be treated as new documents.

Existing implementations

See gatsby-source-contentful's sourceNodes for an example of this technique.

https://github.com/gatsbyjs/gatsby/blob/3c7d6ee04882912695d672d94073eac63ee45c49/packages/gatsby-source-contentful/src/gatsby-node.js#L37

asyarb commented 4 years ago

Seems like a solid plan to me. I'm not too familiar with Prismic's API, but the line:

predicates would allow us to query just for documents updated past a certain timestamp

makes me wonder if we could alter step 1. of the proposed flow to only fetch documents whose last_publication_date was past the current timestamp in the cache. If this is already what you meant, then feel free to disregard.

If we are able to only fetch needed documents, then this could(?) also eliminate downloading all images on builds too 🎊

angeloashmore commented 4 years ago

My initial thought was that we could just query documents past that timestamp, but that wouldn’t account for deleted documents. Contentful, for example, provides a list of changes since a particular timestamp, including document creation and deletion. Prismic, on the other hand, would only provide a list of published documents. So deleting a document that has not been updated since the last fetch would not be detected.

I think the only way to get a proper diff is to fetch all documents, unfortunately.

asyarb commented 4 years ago

Another random thought:

Does publishing a document in Prismic update the last_publication_date for other documents that reference it via content relationships? If this isn't the case, I'm not sure if comparing documents with last_publication_date and our last known timestamp is going to work.

angeloashmore commented 4 years ago

Good point. In the Contentful source plugin, there's a check for reverse relationships to mark them as updated. I'm guessing it does that for the reason you stated. We would have to do the same.

https://github.com/gatsbyjs/gatsby/blob/3c7d6ee04882912695d672d94073eac63ee45c49/packages/gatsby-source-contentful/src/gatsby-node.js#L198-L217

angeloashmore commented 4 years ago

Turns out incremental builds already work.

Some of the modifications we were doing in createPages on our sites was forcing page rebuilds, but in vanilla Gatsby, it works fine.

Gatsby officially supports incremental builds for some of the Gatsby Cloud supported CMSs only because the CMS APIs support delta fetches. With Prismic, we will always need to fetch all documents unless a delta API is released. Luckily, fetching all documents is very quick - typically < 1 second on Netlify.