nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
35 stars 20 forks source link

Benchmark in every rule is annoying - extra 2 lines that should not be necessary #472

Open corneliusroemer opened 3 weeks ago

corneliusroemer commented 3 weeks ago

I've just been reviewing recent changes in ncov-ingest while investigating #471

I noticed that every rule now has gotten 2 new lines benchmark: "benchmarks/rulename.txt". I'm all for benchmarking, i.e. getting runtime/memory/cpu usage information - but it seems this way of doing things is suboptimal - there's almost no intrinsic information here, one could easily write a few-line macro that adds the 2 lines to every rule based on each rule's rule name.

I'm not saying we should do this in ncov-ingest, really, snakemake should not require these verbose directives but offer a single CLI option that auto-generates the benchmark files.

But I noticed it here so thought I'll raise it.

The immediate harm of the 2 lines is that the diff I'm reviewing has a lot of noise (it's like reformatting). For now I'd be in favour of not adding benchmarks to other workflows until there's a less intrusive/verbose way of doing so.

genehack commented 2 weeks ago

This feels more like something that belongs in the snakemake repo? (Which contains, it seems, a relevant open PR...)

Until snakemake supports global benchmarking, if we want benchmarks, we need to add benchmark rules. 🤷

tsibley commented 2 weeks ago

The immediate harm of the 2 lines is that the diff I'm reviewing has a lot of noise (it's like reformatting).

Hmm, how are those lines adding noise to the diff? Can you give an example?