metaGmetapop / metapop

A pipeline for the macro- and micro-diversity analyses and visualization of metagenomic-derived populations
MIT License
37 stars 10 forks source link

Is it required to filter the bam files? #14

Open ChaoLab opened 2 years ago

ChaoLab commented 2 years ago

I want to ask whether variant callers within MetaPpo can only use reads of a certain regions or chromosomes, so there is no need to filter the BAM file. I ask this because I have used extra references when I was doing the mapping.

metaGmetapop commented 2 years ago

The references in the bam files need to match the input references genomes. The bam files do need to be filtered for read depth so there is enough coverage to explore microdiversity, but you can choose to prevent the filtering of contigs that don't have enough horizontal coverage (coverage across the span of the genome) using --min_len and --min_cov

ChaoLab commented 2 years ago

What if the input reference genomes are only a subset of the references in the bam files? Will this be acceptable for MetaPop?

metaGmetapop commented 2 years ago

You can alter the header of the bam files to match to match the input reference genomes or just filter the reference genomes for just those in the bam files.

ChaoLab commented 2 years ago

Many thanks! I update with a report here: using pysam to remove both the unwanted headers and reads mapped to unwanted assemblies can solve this issue. The 'unwanted assemblies' here represent the ones that I included previously for competitive mapping using Bowtie 2.