mitodl / ocw-studio

Open Source Courseware authoring tool
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Alert when sites are stuck building (build timeout) #1917

Open ChristopherChudzicki opened 1 year ago

ChristopherChudzicki commented 1 year ago

We recently had a situation where:

  1. Course author published a site to draft.
  2. Concourse was stuck indefinitely. Concourse was stuck discovering inputs. Input discover does not have a timeout. In this case, it was "Waiting for suitable version" of webpack-json". Manually running fly check-resource on the offending pipeline/resource managed to get the thing unstuck.
  3. Course author alerted us.

Effectively, the site silently failed to publish and the Studio UI was stuck in "Pending". (Concourse did not consider itself failing, but it was never going to succeed.)

We should raise an alert in this scenario so that the failure is not silent.

ChristopherChudzicki commented 1 year ago

One idea: Studio triggers the build, and site builds are fast, so Studio could issue an alert if the site fails to build within X minutes.