When running the pipeline with host removal, I often find a reasonable proportion of the resulting bins still get classified by CAT as the host (in my case, Quercus robur), suggesting that some proportion of host reads in the fastq files are not being removed. This might depend on the overall quality of the host genome assembly provided, as well as specific bowtie2 tuning parameters that can be optionally set.
It might be useful to add an optional host removal step post-assembly, to find and remove contigs aligning to the host genome, to capture reads that passed the first host filter. The output of this could then be passed to the binning stage as required.
Description of feature
When running the pipeline with host removal, I often find a reasonable proportion of the resulting bins still get classified by CAT as the host (in my case, Quercus robur), suggesting that some proportion of host reads in the fastq files are not being removed. This might depend on the overall quality of the host genome assembly provided, as well as specific bowtie2 tuning parameters that can be optionally set.
It might be useful to add an optional host removal step post-assembly, to find and remove contigs aligning to the host genome, to capture reads that passed the first host filter. The output of this could then be passed to the binning stage as required.