rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Simplify Python code within certain snakemake rules #120

Open jmtsuji opened 5 months ago

jmtsuji commented 5 months ago

Some snakemake rules (e.g., rule rule summarize_contigs_by_coverage in polish.smk) currently include long Python code blocks that contain functions etc. Would it be better to simplify the code included in the Snakefile, e.g., by calling a separate Python script, for such code?

jmtsuji commented 5 months ago

Previous discussion for rule summarize_contigs_by_coverage in #114:

@jmtsuji

Thanks for improving the function definition here. How do you think we should handle such functions? Might it be better to move functions like this into a different part of the snakefile (or to a different file in the repo) rather than keep it nested inside the rule?

@LeeBergstrand

@jmtsuji, do you have to move imports inside the run block as if it's a script? I suspect everything outside the run block can't be seen inside unless it is a particular variable created by Snakemake. That is my suspicion, but I am unsure; it could be injecting it differently than expected. If we wanted to move it, I would move it to utils.py (most of utils.py would be moved to its library, but we can have two util files) or a dedicated coverage so as not to clutter up to code.

jmtsuji commented 5 months ago

@LeeBergstrand Regarding your comment above, I think you are probably right... I also suspect that everything outside the code block can't be seen inside. We will probably need to import the code via an import command or just write a Python script that we can call for the rule.

jmtsuji commented 5 months ago

Given that we don't have very many rules with long Python code blocks at the moment, I think it's not a high priority to try to move around the code at this point. However, it might be something we could consider in the future, especially if we create more rules like summarize_contigs_by_coverage that have complex Python code. OK to table this issue for now? Or do you think it could be worthwhile to explore simplifying the code in the snakefile for this rule?

LeeBergstrand commented 5 months ago

Given that we don't have very many rules with long Python code blocks at the moment, I think it's not a high priority to try to move around the code at this point. However, it might be something we could consider in the future, especially if we create more rules like summarize_contigs_by_coverage that have complex Python code. OK to table this issue for now? Or do you think it could be worthwhile to explore simplifying the code in the snakefile for this rule?

I would table the issue for now. I would probably move the code to utils.py but after we pull the the sample object code out into its own library.

jmtsuji commented 5 months ago

OK, sounds good -- let's table this for now.

jmtsuji commented 5 months ago

Update: as discussed in #126 , the logic for rule summarize_contigs_by_coverage is getting complex enough that it might make sense to move the code into its own script in future:

From #126:

In the long term, I think it might make sense to move the code in summarize_contigs_by_coverage into a Python utility script -- this would make it easier to write/run tests on some of the logic being performed in this code.