Closed joverlee521 closed 3 months ago
Hmm. Downloading the Snakemake state locally may fix this problem, but it can/will cause other problems. I don't know if it'd be ok if we scope it down to not all of .snakemake/
but just .snakemake/metadata/
as you suggest in 1.
What's the effect of the warnings? Is there useful information missing from the report? Or is just wanting to suppress the noise from Snakemake?
What's the effect of the warnings? Is there useful information missing from the report? Or is just wanting to suppress the noise from Snakemake?
The generated report does not include any runtime info:
Ah, looking more closely at the contents of .snakemake/metadata/
, I do think we want to start downloading it by default. It's mostly info used to determine if Snakemake needs to re-run rules based on inputs/outputs, and thus is akin to the file mtimes which we already preserve on download.
- Include
.snakemake/metadata
in the downloads from AWS Batch so that users can generate the Snakemake report locally.
We should do this, per above. I'll open a PR.
2. Automatically generate the Snakemake report within the AWS Batch job so that users can download the rendered report
[2] definitely seems like the nicer option and maybe should be applied across all runtimes for
nextstrain build
?
We could also do this as well, but it requires a little more consideration about how/where/when. Would you open it as a separate issue if you'd like to see it?
This will also need a new docker-base image, as the same exclusions of .snakemake/
are recapitulated there:
- Automatically generate the Snakemake report within the AWS Batch job so that users can download the rendered report [2] definitely seems like the nicer option and maybe should be applied across all runtimes for
nextstrain build
?We could also do this as well, but it requires a little more consideration about how/where/when. Would you open it as a separate issue if you'd like to see it?
Hmm, maybe this doesn't need to be built into the Nextsrain CLI. It could just be a separate step in the pathogen-repo-build
workflow so we have reports for our automated pathogens.
It could just be a separate step in the
pathogen-repo-build
workflow so we have reports for our automated pathogens.
Totally.
Context
Snakemake has removed the
--stats
option in v8, so I'm looking into the--report
option for long term workflow stats.The Snakemake report must be generated after the workflow has finished. I thought this would be as simple as attaching/downloading an old AWS Batch job then running
nextstrain build . --report
.When I did this for ncov-ingest, I saw a bunch of warnings along the lines of:
I then realized we are explicitly excluding Snakemake state in the downloads from AWS Batch:
https://github.com/nextstrain/cli/blob/8ed779c9741da868341ca4518e8eff83ffba8e60/nextstrain/cli/runner/aws_batch/s3.py#L113-L124
Possible solutions
.snakemake/metadata
in the downloads from AWS Batch so that users can generate the Snakemake report locally.[2] definitely seems like the nicer option and maybe should be applied across all runtimes for
nextstrain build
?