rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Specify more temp files #92

Closed jmtsuji closed 6 months ago

jmtsuji commented 9 months ago

Rotary generates a lot of files that seem extraneous. We should add more of these to the outputs of rotary rules and make them temp().

LeeBergstrand commented 8 months ago

temp_files

@jmtsuji I used snakemake's --filegraph command to make this image. Temped files are highlighted in red, and my suggestions are highlighted in green. Thoughts? Do you have any other suggestions?

LeeBergstrand commented 8 months ago

I was also thing we could temp all the polishing steps. There's a few rules here where we kept output diaganotic txt output files. I think we could add a rule to compile these to TSVs and then we would be able to temp out their entire directories.

LeeBergstrand commented 8 months ago

Here are the file sizes:

25M ./assembly/end_repair 14M ./assembly/flye/20-repeat 5.5M ./assembly/flye/00-assembly 17M ./assembly/flye/30-contigger 13M ./assembly/flye/40-polishing 5.7M ./assembly/flye/10-consensus 66M ./assembly/flye 90M ./assembly 189M ./qc/long 143M ./qc/short 331M ./qc 9.5M ./polish/polypolish/input 239M ./polish/polypolish 16M ./polish/polca 422M ./polish/medaka 16M ./polish/cov_filter 691M ./polish 2.3M ./annotation/checkm/protein_files 2.7M ./annotation/checkm 3.1M ./annotation/eggnog 310M ./annotation/coverage 6.7M ./annotation/dfast/ddbj 6.9M ./annotation/dfast/genbank 73M ./annotation/dfast 388M ./annotation 5.6M ./circularize/combine 9.5M ./circularize/polypolish/input 239M ./circularize/polypolish 12M ./circularize/circlator 5.6M ./circularize/filter 261M ./circularize 1.8G .

jmtsuji commented 8 months ago

Thanks for the helpful diagram and proposed changes! Is it okay if I wait to review this until after the holidays, or would you like input on this sooner?

LeeBergstrand commented 8 months ago

Thanks for the helpful diagram and proposed changes! Is it okay if I wait to review this until after the holidays, or would you like input sooner?

After the holidays it is okay. I planned to run some large datasets over the break, but that appears to have been delayed. In the worst-case scenario, I will build and use a branch with temp() changes.

LeeBergstrand commented 8 months ago

@jmtsuji Created a branch for this: https://github.com/rotary-genomics/rotary/tree/more_temp_files

LeeBergstrand commented 8 months ago

@jmtsuji I might take this approach. https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#shadow-rules

I used it for the hot-fix for https://github.com/rotary-genomics/rotary/pull/105.

LeeBergstrand commented 8 months ago

@jmtsuji I might take this approach. https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#shadow-rules

I used it for the hot-fix for #105.

So, this doesn't work the way I thought it would. I need to do some tests, but it appears to move all the files back, even those not listed in the output parameter of the snakemake rule.

jmtsuji commented 6 months ago

@LeeBergstrand Do you think we can close this issue now that #114 has been merged?

LeeBergstrand commented 6 months ago

Yes!