Closed corneliusroemer closed 8 months ago
Ingest runs quite slowly partially because it involves writing around 20GB to disk rather than streaming directly into tsv-filter here: https://github.com/nextstrain/forecasts-ncov/blob/d051c57cdea7b174e6fcff9e890283a431a67879/ingest/rules/sequence_counts.smk#L6-L33
The two rules could be turned into one and do the filtering on the fly.
This will require updates to the vendored download-from-s3 script to support streaming to stdout that can then be piped to tsv-filter.
Quick look suggests it might work with /dev/stdout?
/dev/stdout
Context
Ingest runs quite slowly partially because it involves writing around 20GB to disk rather than streaming directly into tsv-filter here: https://github.com/nextstrain/forecasts-ncov/blob/d051c57cdea7b174e6fcff9e890283a431a67879/ingest/rules/sequence_counts.smk#L6-L33
The two rules could be turned into one and do the filtering on the fly.