Closed jmtsuji closed 3 months ago
To implement the --no_func_anno version of DFAST, we might need to either add conditions to the run_dfast rule or make a second rule called something like run_dfast_simple. Not sure if the --no_func_anno version of DFAST would still need a DB dir... we would need to test this.
I just tested running DFAST in --no_func_anno
mode. It does not need a DB dir and finishes in ~10 seconds with 4 threads. To me, this supports that having the functional_annotations_DFAST
param above might be a helpful feature, if we decide to implement the optional annotation sub-rules like I've proposed.
@LeeBergstrand What would you think about adding a section to the config file where the desired annotations can be specified? Something like:
Annotation sub-rules for rotary to run (comment out or delete lines to skip annotation types) annotations:
- functional_annotations_DFAST
- functional_annotation_EggNOG_mapper
- taxonomy_assignment_GTDBTk
- quality_check_CheckM2 # Must be set for the automated quality check to run
- read_coverage
It might be easier to specify things as a list like we did for contamination_references_ncbi_accessions
:
# Select Annotations: DFAST_Func (light annotation), EggNOG (heavy annotation including KEGG),
# GTDBTk (taxonomy), CheckM2 (genome quality), coverage (read coverage statistics)
annotations: ['DFAST_Func', 'EggNOG', 'GTDBTk', 'CheckM2', 'coverage']
Would it be easier to add new annotations and more accessible for people to understand without remembering the rule names?
@LeeBergstrand Let me know your thoughts about this optional annotation sub-rule idea -- thanks!
We should always provide gene/ORF calls for users and make the DFAST functional annotations optional.
@LeeBergstrand Thanks for the feedback!
It might be easier to specify things as a list like we did for
contamination_references_ncbi_accessions
:# Select Annotations: DFAST_Func (light annotation), EggNOG (heavy annotation including KEGG), # GTDBTk (taxonomy), CheckM2 (genome quality), coverage (read coverage statistics) annotations: ['DFAST_Func', 'EggNOG', 'GTDBTk', 'CheckM2', 'coverage']
I like this approach 👍 The simplified annotation names you suggested are also great.
We should always provide gene/ORF calls for users and make the DFAST functional annotations optional.
OK, sounds good!
I'll plan to make a PR for this in the near-ish future. I'm working on code for the rotate/stitch modules at the moment, so I might not get to this PR right away.
Addressed in https://github.com/rotary-genomics/rotary/pull/154
Currently, unless the
--until
snakemake flag is used, rotary will run all steps in the annotation module. At the moment, these steps include:Some of these steps are quite time (or memory) consuming or need a lot of disk space for DB download, and these factors can make testing difficult. Also, some users might just want a finished genome without all the detailed annotations.
@LeeBergstrand What would you think about adding a section to the config file where the desired annotations can be specified? Something like:
Regarding DFAST: I think we should always provide gene/ORF calls for users, but I think functional annotation by DFAST is something we could potentially make optional. I checked, and it is possible to run DFAST with a
--no_func_anno
flag to skip functional annotation. Thus, I am thinking that thefunctional_annotations_DFAST
point could be used to toggle DFAST's--no_func_anno
flag.I think it might be fairly easy to implement this optional annotation concept. Most of the options could be implemented by just by adding some conditionals to the
summarize_annotation
rule. To implement the--no_func_anno
version of DFAST, we might need to either add conditions to therun_dfast
rule or make a second rule called something likerun_dfast_simple
. Not sure if the--no_func_anno
version of DFAST would still need a DB dir... we would need to test this.One benefit of adding optional annotation is that we would not need to worry as much about supporting
--until circularize
as a common use case. I am using--until circularize
at the moment to bypass the annotation module.@LeeBergstrand Let me know your thoughts about this optional annotation sub-rule idea -- thanks!