Open joverlee521 opened 2 weeks ago
A question that came up as I was working on #240: Do we need to uncompress/compress the COG UK metadata during the workflow?
The transform_genbank_metadata rule uses the gzipped COGUK metadata file directly. I do not see any other rule consuming the uncompressed COG UK metadata as input, so it seems like we are uncompressing/compressing for the sake of being able to have a copy on AWS S3 that is zstd compressed.
transform_genbank_metadata
It's not clear how much resources these jobs actually take up since we don't have benchmark files (yet!). I'll revisit this question once we have more data from workflow runs.
Ah, this might also be a result of our upload-to-s3 and download-from-s3 scripts not having the option to skip compression during transfer.
upload-to-s3
download-from-s3
Context
A question that came up as I was working on #240: Do we need to uncompress/compress the COG UK metadata during the workflow?
The
transform_genbank_metadata
rule uses the gzipped COGUK metadata file directly. I do not see any other rule consuming the uncompressed COG UK metadata as input, so it seems like we are uncompressing/compressing for the sake of being able to have a copy on AWS S3 that is zstd compressed.It's not clear how much resources these jobs actually take up since we don't have benchmark files (yet!). I'll revisit this question once we have more data from workflow runs.