nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
217 stars 110 forks source link

Host removal on assembly contigs #437

Open prototaxites opened 1 year ago

prototaxites commented 1 year ago

Description of feature

When running the pipeline with host removal, I often find a reasonable proportion of the resulting bins still get classified by CAT as the host (in my case, Quercus robur), suggesting that some proportion of host reads in the fastq files are not being removed. This might depend on the overall quality of the host genome assembly provided, as well as specific bowtie2 tuning parameters that can be optionally set.

It might be useful to add an optional host removal step post-assembly, to find and remove contigs aligning to the host genome, to capture reads that passed the first host filter. The output of this could then be passed to the binning stage as required.

d4straub commented 1 year ago

suggesting that some proportion of host reads in the fastq files are not being removed

https://github.com/nf-core/mag/issues/381