nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
35 stars 20 forks source link

GitHub actions log is >2hr behind actual AWS run #474

Open corneliusroemer opened 3 weeks ago

corneliusroemer commented 3 weeks ago

I just noticed that ingest is seemingly stuck for at least 2 hours without doing anything? It hasn't yet started any curation. It's not fetching from Genbank nor from GISAID, yet nothing is happening. This is very strange.

@joverlee521 do you have an idea what is going on. I ran these with fetching deselected

https://github.com/nextstrain/ncov-ingest/actions/runs/10513067722/job/29127832018

corneliusroemer commented 3 weeks ago

Maybe it's just log output to GitHub that's stuck instead? I didn't check locally or on aws, just in GitHub actions logs.

joverlee521 commented 3 weeks ago

Logs are further ahead on AWS.

Maybe behind in GH Action logs due to Python output buffering?

corneliusroemer commented 3 weeks ago

Can we somehow fix the output buffering? I love being able to see status in the Gh logs but it's useless if it's behind.

joverlee521 commented 3 weeks ago

~I think this is something we would have to fix in the Nextstrain CLI: https://github.com/nextstrain/cli/issues/393~

Nvm, I'm connected to the same job locally and my logs are in sync with the AWS Cloudwatch logs. This is something specific to the GH Action workflow.

joverlee521 commented 2 weeks ago

Ah, this is an issue in the GH Action runners: https://github.com/actions/runner/issues/1031

There is slow down because they have to scan every line to mask any potential secrets in the logs.

tsibley commented 2 weeks ago

Wowee.

victorlin commented 2 weeks ago

From https://github.com/actions/runner/issues/1031:

Once a log message gets up to about 6k characters it looks like the runner starts to take a few seconds to process. At 100k characters this can be 10 minutes or so

The log linked in the input description has 2583 lines and all are under 1500 characters long, most under 400. This doesn't seem that bad. Maybe something else is happening. Likely nothing we can do here if this is specific to GH Actions?

tsibley commented 2 weeks ago

Is the issue that stdout is block-buffered since GitHub Actions redirects it to a non-interactive destination? If Python's default block size is high on the runners—it can vary machine to machine, IIRC—it might take a bit to fill the block up before it's flushed. We could try running under PYTHONUNBUFFERED=1 in the GitHub workflow to diagnose this. If it helps, then we could switch explicitly to line-buffering in Nextstrain CLI; line-buffering is a nice middle ground for this sort of output, between fully unbuffered and block-buffered.