nextstrain / status

Nextstrain status pages
https://nextstrain.github.io/status/
1 stars 0 forks source link

Reflect overall success/failure in favicon indicator #29

Closed tsibley closed 2 weeks ago

tsibley commented 3 weeks ago

Useful for seeing the state of things at a glance. Inspired by GitHub's similar behaviour.

For now, the "success" indicator doesn't mean nothing's amiss (e.g. that recent runs have completed, that nothing's been cancelled, etc.), just that nothing's failed. We may want to make this more nuanced in the future, but that requires additional favicon states that I'm not inclined to produce right now.

favicon.svg is now a (Inkscape) source file that's no longer copied as-is but instead used to produce multiple build/favicon-*.svg outputs by selective inclusion of layers. The sparse index page gets a "?" indicator in the favicon to suggest "status" without suggesting "all's well" or "something's wrong". There's probably some better indicator here, but it seems good enough for now.

I suspect this will make us want to clean up old lingering failed runs from the status page… ;)

Checklist

tsibley commented 2 weeks ago

CI failure because:

bash: line 1: inkscape: command not found
Traceback (most recent call last):
  File "/home/runner/work/status/status/./favicon-[layer].svg.py", line 6, in <module>
    doc = ET.parse(stdin)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 1222, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 580, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
make: *** [Makefile:18: build/favicon-success.svg] Error 1
make: *** Deleting file 'build/favicon-success.svg'

…which is a whoops! on my part. I didn't mean to have the generation actually be part of the regular build, but as a way to update committed files. But I forgot that part! Will fix.

joverlee521 commented 2 weeks ago

I suspect this will make us want to clean up old lingering failed runs from the status page… ;)

Would it make sense to limit the overall success/failure to the last ~24-48 hours? I've ignored the old lingering failures if the next automated run completed successfully. Seems like a waste to re-run workflows just for the ✅

tsibley commented 2 weeks ago

Fixed up, but uh, what now GitHub?

image

tsibley commented 2 weeks ago

Would it make sense to limit the overall success/failure to the last ~24-48 hours? I've ignored the old lingering failures if the next automated run completed successfully. Seems like a waste to re-run workflows just for the ✅

It's only considering the latest run for each workflow. The issue is we have workflows where the latest run is failed and quite old, because it hasn't been re-run since (somewhat by design).

tsibley commented 2 weeks ago

Oh, maybe you meant to only consider the latest run for each workflow, iff it's within the last 24-48 hours?

joverlee521 commented 2 weeks ago

Oh, maybe you meant to only consider the latest run for each workflow, iff it's within the last 24-48 hours?

Yup! For example, there's an mpox ingest failure from Sun Nov 03 that I've ignored because the subsequent days are successful

Screenshot 2024-11-12 at 11 28 04 AM

tsibley commented 2 weeks ago

Right, but that's already ignored by the "last run" consideration. It doesn't require a 24-48 hour recency condition. I'm talking about things like this:

image

tsibley commented 2 weeks ago

Maybe it's not clear that

https://github.com/nextstrain/status/blob/1a895066c65a428bc4256f76463104794d7c7636/pathogen-workflows.html.js#L32-L37

is only considering the last run of each workflow in each repo?

tsibley commented 2 weeks ago

Like, maybe I should have written:

const someFailure = 
  runsByRepoAndWorkflow 
    .flatMap(([, workflows]) => 
      workflows.map(([, [lastRun, ]]) => 
        lastRun.conclusion ?? lastRun.status)) 
    .some(x => x === "failure"); 

or

const someFailure = 
  runsByRepoAndWorkflow 
    .flatMap(([, workflows]) => 
      workflows.map(([, runs]) => 
        runs[0].conclusion ?? runs[0].status)) 
    .some(x => x === "failure"); 
joverlee521 commented 2 weeks ago

Right, but that's already ignored by the "last run" consideration.

Ah gotcha, yeah, totally missed this was only considering the last run!

However, the lassa example is an interesting case...I'm pretty sure it was just a manual test run that was later fixed for the automated run.

tsibley commented 2 weeks ago

Yeah. The lassa example will also fall off the 90 day cliff

https://github.com/nextstrain/status/blob/72803ecc7e0c77214dd26ac59fc68567ce1c5a01/pathogen-workflows.sql#L128-L129

on 24 Dec.

I will say that in general, I find the Ingest / Phylo / Ingest to Phylo pattern a bit weird (though I understand why it's that way currently).

tsibley commented 2 weeks ago

Like, maybe I should have written…

Repushed with the second version to hopefully make it clearer.

joverlee521 commented 2 weeks ago

Since merging there's been two workflow runs with the inkscape error, but other runs completed successfully.

https://github.com/nextstrain/status/actions/runs/11821865530/job/32937840090:

< favicon.svg \
  inkscape --pipe --export-plain-svg --export-area-page --export-filename - \
| ./favicon-'[layer]'.svg.py success \
> favicon-success.svg
bash: line 1: inkscape: command not found
Traceback (most recent call last):
  File "/home/runner/work/status/status/./favicon-[layer].svg.py", line 7, in <module>
    doc = ET.parse(stdin)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 1222, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 580, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
make: *** [Makefile:18: favicon-success.svg] Error 1
make: *** Deleting file 'favicon-success.svg'

https://github.com/nextstrain/status/actions/runs/11822605189/job/32940011673:

< favicon.svg \
  inkscape --pipe --export-plain-svg --export-area-page --export-filename - \
| ./favicon-'[layer]'.svg.py failure \
> favicon-failure.svg
bash: line 1: inkscape: command not found
Traceback (most recent call last):
  File "/home/runner/work/status/status/./favicon-[layer].svg.py", line 7, in <module>
    doc = ET.parse(stdin)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 1222, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.10/xml/etree/ElementTree.py", line 580, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
make: *** [Makefile:18: favicon-failure.svg] Error 1
make: *** Deleting file 'favicon-failure.svg'
joverlee521 commented 2 weeks ago

There's been a mix of successful and failure runs...why does make sometimes recognize the committed favicon files but will run the favicon-%.svg rule other times?

joverlee521 commented 2 weeks ago

why does make sometimes recognize the committed favicon files but will run the favicon-%.svg rule other times?

Hmm, is there a chance that this is due to the timing/order in which the files are downloaded from the repository during actions/checkout? If favicon.svg was downloaded one second after the favicon-*.svg files, then make sees the favicon-*.svg file as outdated and runs the rule.

Should we update the make command to ensure that the favicon-%.svg rule never runs during CI?

make -o favicon.svg -o 'favicon-[layer].svg.py'
tsibley commented 2 weeks ago

Ugh. Maybe? Good spotting.

tsibley commented 2 weeks ago

https://github.com/nextstrain/status/pull/30