wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
275 stars 30 forks source link

How to specify filename formula for CoverM contigs? #84

Closed jolespin closed 1 year ago

jolespin commented 2 years ago

I have a pipeline I'm working on where all the sample names are subdirectories. The structure looks like this:

coverage_output/sample_1/intermediate/bowtie2_output/mapped.sorted.bam coverage_output/sample_2/intermediate/bowtie2_output/mapped.sorted.bam ...

So when I run CoverM it removes all the file paths and thinks they are all samples called mapped.sorted. Is there a naming formula that can be specified anywhere?

Also, is there a way to specify "noIntraDepthVariance" when mode is metabat?

wwood commented 2 years ago

Hi,

I'm afraid not, perhaps the easiest way to is make a folder of symlinks with something along these lines (untested)

mkdir bams_symlinked
cd bams_symlinked
ls ../coverage_output/**/*bam |sed 's=/intermediate.*//; s=../coverage_output/==' |parallel ln -s ../coverage_output/{}/intermediate/bowtie2_output/mapped.sorted.bam {}.bam

Also, is there a way to specify "noIntraDepthVariance" when mode is metabat?

Not directly, but (apart from filtering the output file yourself) you can recreate the output data by specifying -m length mean and --min-read-percent-identity 0.97001. However, you cannot currently include secondary alignments as metabat does, unless you use a dev version (i.e. the main branch) and specify --include-secondary. It would probably be easier to post-process the output when run with -m metabat.

HTH, ben