Near real-time updates to crawled data

w3c / webref

Machine-readable references of terms defined in web browser specifications

MIT License

310 stars 72 forks source link

In a variety of contexts (CI in particular, but likely also in the context of the data re-used by spec authoring tools), it would be ideal if the content in webref reflected changes in the underlying documents in close to real-time.

One way we could enable this (at least partially) is by having spec repos trigger a webref update for the given spec whenever the main source file of the said spec is updated - this could be typically achieved with a webhook installed at the repo or (more likely for scaling) at the org level.

One issue is that if several updates are processed at the same time, they would likely trigger an error at the time of pushing the results; this could be avoided either using a different timing in how checkouts and crawls are organized, or by doing a full crawl (with HTTP caching optimizations to reduce the time / network impact).

w3c / webref

Near real-time updates to crawled data #486