Open roblanf opened 4 years ago
Dear colleagues,
First of all, thanks a lot for this amazing package! I would like to ask for some help to improve the assmebly of consensus DNA sequences. I'm working with DNA sequences from the COI gene (mtDNA) and I'm usnig the function 'SangerAlignment' to get the consensus sequences. Well, I would like to start by asking what is the difference between the M1 and M2 options for the TrimmingMethod argument? Does it works with Phred values?
Second, I'm getting a large number of ambiguous bases when I try to assembly my COI sequences, which becomes indesirable because it is a mtDNA marker and "heterozygous" sites are unexpected. So, I would like to know how can I control (avoid) such ambigous bases? I tried to set the signalRatioCutoff argument to several values (0.99, 0.5, 0.1) but nothing changes. It would be nice if there is a way to set the function 'SangerAlignment' to retrieve only the peak with the greatest height (quality?) instead of put an ambigous base at that position. Finally, I wonder if someone can explain some details of the following arguments: M1TrimmingCutoff, M2CutoffQualityScore, M2SlidingWindowSize, baseNumPerRow, heightPerRow.
Thanks in advance!
Best regards.
Hi @foersterst!
M1 and M2 methods: M1 is the modified Mott's trimming algorithm that you can also find in Phred/Phrap and Biopython. M2 is like trimmomatic's sliding window method. In general I'd recommend trying both and seeing what gives more sensible trimming for your data, using the Shiny app to visualise what the trimming methods are trimming. And yes, both use phred scores.
On the ambiguous bases: this should be controllable when you change the signalRatioCutoff. If nothing is changing and it looks like it should be according to the chromatogram, then please post a new issue with (ideally) a reproducible example. E.g. an ab1 file, and a couple of commands with different thresholds, along with the output you get, and the output you expect. More generally on this point, I would worry a lot if you are seuqencing mtDNA and your chromatograms are showing evidence of heterzygosity - this is likely to indicate contamination of some sort.
On the argument details, this is an omission in the docs that we will need to fix. I'll open a new issue for that.
Rob
An idea raised in https://github.com/roblanf/sangeranalyseR/issues/46, to build consensus sequences while taking account of the phred scores. It's certainly possible.
Right now we use DECIPHER's the
ConsensusSequence
function fromDECIPHER
, but we could build our own function for this and/or look around for a function that uses phred scores.