nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
216 stars 110 forks source link

Enable more options to vary the BowTie2 alignment parameters during host removal and alignment #360

Closed alexhbnr closed 1 year ago

alexhbnr commented 1 year ago

Description of feature

Ancient DNA has the characteristic substitution pattern compared to the reference genomes at the terminal bases of reads, i.e. increase of C>T substitution at the 5' end and G>A substitutions at the 3' end of reads. These substitutions have a direct impact on the success of aligning sequencing data against a reference genome or the de novo assembled contigs. During the alignment, BowTie2 tries to infer the location of a read using seeds. By default, no mismatches in the seeds are allowed (-N 0) and therefore some reads might not be correctly aligned to the reference in case there is a mismatch in the seed caused by these aDNA-specific substitutions.

By allowing to customise the BowTie2 alignment parameters more freely, one could set the option -N to 1 for ancient DNA to avoid having a low number of aligned reads either to the host genome or to the assembled contigs.

jfy133 commented 1 year ago

Looking through the documentation, actually this is already possible via:

https://nf-co.re/mag/2.2.1/parameters#bowtie2_mode

The mode is a bit misleading as it suggests it corresponds to fixed 'presests' in Bowtie2, but in fact it's a string that you can put whatever you want.

I guess then the question is should I fix this to the presets and or make -N an explicit option, or just leave what already exists? What do you think @alexhbnr ?

alexhbnr commented 1 year ago

No, I was also under the impression that it was specific to select the BowTie2 prefix. If we can also append -N 1 for ancient samples, there is no need to add an explicit option, I think.

jfy133 commented 1 year ago

OK actually this only applies to aligning reads back to the assembly (this is actually what is set during the 'ancient DNA mode', however it's not used in the PHIX/Host removal, should I apply it to these others too?

Should these be independant (I guess you don't need a mismatch for phiX removal...?), or just blanket is OK? And should I associate this with the 'ancient DNA mode' as with the assembly alignment?

        ext.args = params.bowtie2_mode ? params.bowtie2_mode : params.ancient_dna ? '--very-sensitive -N 1' : ''

which is currently like this

alexhbnr commented 1 year ago

You are right, for phiX removal we wouldn't need this adaptation. For the host removal DNA, it is a matter of preference. In most cases we have only a low proportion of host DNA that has to be removed. In this case, it doesn't really matter whether we enable -N 1. So let's keep it like it currently is and just allow to enable -N 1 for the alignment back to the assembled contigs.

jfy133 commented 1 year ago

Thanks!