Closed angeloashmore closed 4 years ago
Seems like a solid plan to me. I'm not too familiar with Prismic's API, but the line:
predicates would allow us to query just for documents updated past a certain timestamp
makes me wonder if we could alter step 1. of the proposed flow to only fetch documents whose last_publication_date
was past the current timestamp in the cache. If this is already what you meant, then feel free to disregard.
If we are able to only fetch needed documents, then this could(?) also eliminate downloading all images on builds too 🎊
My initial thought was that we could just query documents past that timestamp, but that wouldn’t account for deleted documents. Contentful, for example, provides a list of changes since a particular timestamp, including document creation and deletion. Prismic, on the other hand, would only provide a list of published documents. So deleting a document that has not been updated since the last fetch would not be detected.
I think the only way to get a proper diff is to fetch all documents, unfortunately.
Another random thought:
Does publishing a document in Prismic update the last_publication_date
for other documents that reference it via content relationships? If this isn't the case, I'm not sure if comparing documents with last_publication_date
and our last known timestamp is going to work.
Good point. In the Contentful source plugin, there's a check for reverse relationships to mark them as updated. I'm guessing it does that for the reason you stated. We would have to do the same.
Turns out incremental builds already work.
Some of the modifications we were doing in createPages
on our sites was forcing page rebuilds, but in vanilla Gatsby, it works fine.
Gatsby officially supports incremental builds for some of the Gatsby Cloud supported CMSs only because the CMS APIs support delta fetches. With Prismic, we will always need to fetch all documents unless a delta API is released. Luckily, fetching all documents is very quick - typically < 1 second on Netlify.
The following is based on my initial understanding of the Gatsby cache/node system and how it relates to incremental builds.
Related links
Related Gatsby blog post https://www.gatsbyjs.org/blog/2020-04-22-announcing-incremental-builds/
Using incremental builds without Gatsby Cloud https://www.gatsbyjs.org/docs/page-build-optimizations-for-incremental-data-changes/
Initial thoughts
We currently fetch all documents and create nodes for them on each
gatsby develop
andgatsby build
. This means the timestamp/fingerprint associated with each node resets on each fetch. Because the timestamp is fresh, Gatsby treats it like new data, causing a full build.Prismic's API does not have a diff/delta API, but it does have
ref
and predicate-based querying.ref
allows us to fetch the repo from a specific time (defaults to current/master) while predicates would allow us to query just for documents updated past a certain timestamp.Here's a flow we could try:
master
ref as we currently do.getNodes()
and compare to the fetched documents to determine which documents were deleted (i.e. missing from the fetched documents) or updated (i.e. last_publication_date is greater than the most recent fetch).deleteNode
.createNode
.touchNode
.When the cache is non-existant (i.e. first build/fetch, cleared cache), all documents will naturally be treated as new documents.
Existing implementations
See
gatsby-source-contentful
'ssourceNodes
for an example of this technique.https://github.com/gatsbyjs/gatsby/blob/3c7d6ee04882912695d672d94073eac63ee45c49/packages/gatsby-source-contentful/src/gatsby-node.js#L37