nprapps / liveblog-standalone

NPR's liveblog rig 2.0
Other
6 stars 1 forks source link

Do not rebuild if the document hasn't changed #53

Open thomaswilburn opened 4 years ago

thomaswilburn commented 4 years ago

This will help with caching, since we can then use etag/if-modified-since headers to lower the cost of requesting the document.

The Docs API includes a top-level revisionId property we can check when running persistently, and then fail out if it matches the previous request.

thomaswilburn commented 4 years ago

The solution I checked in--to remember the revision ID we get from Docs and abort a build if it doesn't change--breaks scheduled posts, because we need to change the outputs even if the input stays the same.

It's difficult to do what the old rig does and check the file contents against S3, because our files include dates that update constantly. To do this correctly, we'd theoretically have to track not only whether the Docs revision had changed, but also whether the filtered list of published posts had changed since the last run. There also may be other complexities I'm not thinking of right now.

I am a little concerned about the degree to which that's prone to tricky debugging problems. Running the math, if the user gets a fresh copy of the document (say, 50KB gzipped) per request, that's 300KB/min, or ~18MB/hr of their bandwidth used. Realistically, due to CloudFront caching, about 1/3 to 1/2 of those requests will actually trigger a 304 (thanks to the new ETag code that's still checked in). I think 10MB/hr for realtime page updates, on a page that people are unlikely to leave open for more than a day, is pretty reasonable.

I'm leaving this open, so we can revisit. But right now, I feel like the reliability of a simple, stateless build sequence win out over the possible bandwidth savings of a more complex deployment.