xuechunxu / DiTing

DiTing: A pipeline to infer and compare biogeochemical pathways in metagenomic data
GNU General Public License v3.0
37 stars 7 forks source link

implement a `--continue` option? #4

Open housw opened 3 years ago

housw commented 3 years ago

Hi Xuechun,

could you please implement a --continue option to avoid recomputation for failed jobs? It seems the current version will simply overwrite everything, it would be great to have this feature, particularly when we have paired MG and MT, so we can skip certain steps to save computation time.

Thanks, Shengwei

xuechunxu commented 3 years ago

Hi Shengwei,

Thanks for your suggestion. This is a very nice and urgent need. We already had the plan to implement a --continue option, which is already in our CHANGELOG.md. By the way, how do you want the --continue parameter to work when we have paired MG and MT? Could you please clarify it more clearly? In other words, Under what circumstances do you want to stop and continue the DiTing?

Cheers, Chunxu

housw commented 3 years ago

Hi Chunxu

sorry for misspelling your first name in my last post. I'm glad to hear you're planning on implementing this option. Yes, it would be great if the script could automatically detect the existence of pre-computed files so it will skip some steps, such as the prodigal or the KEGG_annotation step. With snakemake, this can be easily implemented, but I'm not sure how hard it will be for you to adopt. Alternatively, you can create a progress log, and simply start from the last un-finished steps with the --continue option.

For paired MG and MT, I would like to run DiTing on MG reads and assemblies first, then re-use the MG assemblies and annotations (including KEGG_annotations) for MT reads, in this case, most of the time-consuming steps can be skipped. Alternatively, we can have a lookup yaml file to map reads from MG and MT libraries to the same assemblies, so we can run them all together in one command. What do you think?

Cheers, Shengwei