Closed d4straub closed 1 year ago
nf-core lint
overall result: Failed :x:Posted for pipeline commit b1500e1
+| ✅ 148 tests passed |+
#| ❔ 3 tests were ignored |#
!| ❗ 2 tests had warnings |!
-| ❌ 5 tests failed |-
Thanks for your review!
Kraken2 was reported to perform well on amplicon data, e.g. https://doi.org/10.1186/s40168-020-00900-2 & https://www.nature.com/articles/s41598-023-40799-x. I decided here to go with the idea of the pipeline to re-create ASVs and therefore used Kraken2 on ASVs instead of on raw reads. Kraken2 supports silva, rdp, and greengenes, but also whole genome databases. I added of the latter also the "standard" database which is huge but includes the typical microbes. Potentially the "standard" database would allow analyzing any amplicon, e.g. rarely used genes that have no proper gene specific database. Additionally, a custom (i.e. any) Kraken2 database can be used, even if here not added in the
ref_databases.config
. Added four parameter:--kraken2_ref_taxonomy
,--kraken2_ref_tax_custom
,--kraken2_assign_taxlevels
,--kraken2_confidence
I looked into using confidence thresholds for taxonomic assignments, which are essentially k-mer match ratios for the LCA calculation, see https://github.com/DerrickWood/kraken2/blob/v2.1.3/docs/MANUAL.markdown#confidence-scoring. There is also an interesting discussion in
https://github.com/DerrickWood/kraken2/issues/265
about confidence scores. However, I have the impression that there isnt a consensus and all of this is referring to shotgun metagenomics. When looking at the amplicon sequencing paper (https://doi.org/10.1186/s40168-020-00900-2), I couldnt find any mention of the confidence score but bracken was used after kraken2. So for now I added a parameter for the confidence threshold but kept it by default at 0, which is also kraken2's default. I am fine with when 1 k-mer causes e.g. a species instead of a genus annotation (i.e. a low confidence score for species), because I expect that only very rarely a k-mer is really specific. However, conflicting annotations (lets say 2 k-mer on genus a and 1 k-mer on genus b) would be good to capture (and report family in the example instead of genus a). Not sure if thats done right now as expected. There is a tool that calculates confidence scores for kraken2, see https://github.com/Ivarz/Conifer, which is also in bioconda. Potentially interesting.PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).