neufeld / pandaseq

PAired-eND Assembler for DNA sequences
GNU General Public License v3.0
129 stars 24 forks source link

Question pertaining to primers sequences #42

Closed pmb0010 closed 10 years ago

pmb0010 commented 10 years ago

Is it possible within PandaSeq to adjust for errors that maybe found within the primers? I am attempting to overlap forward and reverse amplicon reads from an Illumina HiSeq 2500 run and a large portion of my reads are getting thrown out because no forward primer was found. I am wondering whether there is some error in the base call in some of my primer sequence and was wondering how I can change to account for that error within your program.

Thanks

Pamela

apmasell commented 10 years ago

The -t will also be used for errors in the primers.

pmb0010 commented 10 years ago

Is there any documentation how the "score" (-t) is figured out or what is relates to? (besides what it says on the manual page) . How is the quality figured out? Basically what does a threshold of 0.6 really mean?

Pamela Brannock Postdoctoral Fellow Auburn University Auburn, AL 36849


From: Andre Masella [notifications@github.com] Sent: Wednesday, October 22, 2014 3:24 PM To: neufeld/pandaseq Cc: Pamela Brannock Subject: Re: [pandaseq] Question pertaining to primers sequences (#42)

The -t will also be used for errors in the primers.

— Reply to this email directly or view it on GitHubhttps://github.com/neufeld/pandaseq/issues/42#issuecomment-60149084.

apmasell commented 10 years ago

The score is the probability. The primer will be matched where P(match|match) = probability of correctness as computed from the PHRED score, and P(match|mismatch) = 1- probability of correctness as computed from the PHRED score. For the primer, this mean t ^ length of primer = product(P(match)) for all bases in the primer.

Other details about the scoring are found in the paper.