Open DanInci opened 1 month ago
Two quick comments (proper code review will follow):
benchmark.yaml
as input and initiate the benchmark/node class from there? Two quick comments (proper code review will follow):
- Type hints and docstrings would really help code readability
- There are some controversies around using pickle (mainly for security reasons, but also because objects will not be compatible across versions/systems). I wonder about the advantage of using it here? Why not starting with the
benchmark.yaml
as input and initiate the benchmark/node class from there?
Thanks a lot for this very extensive PR.
I added some questions/suggestions. They are mainly about reducing duplication to keep the code simple and clean. As more general comments:
- Why not starting from the
benchmark.yaml
file and initialize theWorkflowEnginge
from there? This way we could also skip reassigning thebenchmark.yaml
file asBenchmark/Converter
attribute.- Raise Errors with a clear error message instead of using
assert
- Type hints are also used to describe return values, e.g.
def func(arg:argtype) -> returntype:
. I think code readability would be much easier.- There are quite some places were the code looks brittle and will probably fail upon small changes, e.g. in the data schema or the path description. I tried to point them out, but probably missed some. In general it is good to e.g. use named attributes instead of positional calls. I would also try to avoid string manipulation as much as possible.
benchmark: Benchmark
or node: BenchmarkNode
. The alternative allows direct use of the module, without doing the proper validations. And in this case I'd argue is better to reflect the desired responsability through the interface. The need to access the benchmark yaml is just an implementation detail of Snakemake, maybe with other workflow languages, we wouldn't need access to it.
Refactor the workflow engine code, such that the
Snakemake
file is generated on the fly. The following generic interface is provided for workflow execution: