Open corneliusroemer opened 3 months ago
Maybe it's just log output to GitHub that's stuck instead? I didn't check locally or on aws, just in GitHub actions logs.
Logs are further ahead on AWS.
Maybe behind in GH Action logs due to Python output buffering?
Can we somehow fix the output buffering? I love being able to see status in the Gh logs but it's useless if it's behind.
~I think this is something we would have to fix in the Nextstrain CLI: https://github.com/nextstrain/cli/issues/393~
Nvm, I'm connected to the same job locally and my logs are in sync with the AWS Cloudwatch logs. This is something specific to the GH Action workflow.
Ah, this is an issue in the GH Action runners: https://github.com/actions/runner/issues/1031
There is slow down because they have to scan every line to mask any potential secrets in the logs.
Wowee.
From https://github.com/actions/runner/issues/1031:
Once a log message gets up to about 6k characters it looks like the runner starts to take a few seconds to process. At 100k characters this can be 10 minutes or so
The log linked in the input description has 2583 lines and all are under 1500 characters long, most under 400. This doesn't seem that bad. Maybe something else is happening. Likely nothing we can do here if this is specific to GH Actions?
Is the issue that stdout is block-buffered since GitHub Actions redirects it to a non-interactive destination? If Python's default block size is high on the runners—it can vary machine to machine, IIRC—it might take a bit to fill the block up before it's flushed. We could try running under PYTHONUNBUFFERED=1
in the GitHub workflow to diagnose this. If it helps, then we could switch explicitly to line-buffering in Nextstrain CLI; line-buffering is a nice middle ground for this sort of output, between fully unbuffered and block-buffered.
I just noticed that ingest is seemingly stuck for at least 2 hours without doing anything? It hasn't yet started any curation. It's not fetching from Genbank nor from GISAID, yet nothing is happening. This is very strange.
@joverlee521 do you have an idea what is going on. I ran these with fetching deselected
https://github.com/nextstrain/ncov-ingest/actions/runs/10513067722/job/29127832018