Closed patriciatran closed 1 year ago
I think this has to be done in a few steps
Dereplicate at 95%. This is currently a bit annoying with CoverM because it only clusters genomes. You have to specify each contig as being a separate genomes.
Find contigs ≥75% of the contig length covered ≥1x by reads recruited: ≥90% average nucleotide identity. Use --min-covered-fraction 75 --min-read-percent-identity 90
. Maybe also use --min-read-percent-identity 95
, not 100% clear.
Do the final mapping with --min-read-percent-identity 95
against the contigs that pass step 2.
HTH, ben
Hello,
I am trying to follow the methods in this paper. It looks like they used BamM but on the
BamM
page it says to refer toCoverM
instead.I am trying to find the settings for this section specifically to reproduce them with my dataset:
Would the settings under
coverm contig
be correct? ≥10 Kbp:--min-read-aligned-length 10000
≥95% global identity:--min-read-percent-identity 95
≥75% of the contig length covered:--min-read-aligned-percent 75
≥1x by reads recruited: ?? ≥90% average nucleotide identity: ??I was wondering whether the appropriate steps would be to run
coverm cluster -ani 90
first and then docoverm contig
.Thanks for letting me know if you have any tips.
Best,
Patricia