labsquare / CutePeaks

CutePeaks is a standalone Sanger trace viewer steered by a modern and user-friendly UI.
https://labsquare.github.io/CutePeaks/
GNU General Public License v3.0
43 stars 14 forks source link

makeBaseCalls Ab1 file #50

Open dridk opened 6 years ago

dridk commented 6 years ago

Some AB1 file doesn't provide base calling from raw trace. That's mean the following fields are empty and I cannot display the trace : PBAS, PLOC, PCON, DATA.9-14 We need to compute the base balls if they are missing.

@see AB1 specification http://www6.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

@see R implementation https://www.bioconductor.org/packages/devel/bioc/vignettes/sangerseqR/inst/doc/sangerseq_walkthrough.pdf

dridk commented 6 years ago

how to compute phred score https://www.dnastar.com/seqman_pro_help/index.html#!Documents/qualityscorecalculations.htm

dridk commented 6 years ago

https://tools.thermofisher.com/content/sfs/brochures/seq-quantification-app-note.pdf

dridk commented 6 years ago

http://www.insilicase.com/Web/PhredScores.aspx

dridk commented 6 years ago

https://www.ncbi.nlm.nih.gov/pubmed/9521921

dridk commented 6 years ago

http://www.phrap.org/phredphrapconsed.html

dridk commented 6 years ago
  1. Algorithms.

    Phred uses simple Fourier methods to examine the four base traces in the region surrounding each point in the data set in order to predict a series of evenly spaced predicted locations. That is, it determines where the peaks would be centered if there were no compressions, dropouts, or other factors shifting the peaks from their "true" locations.

    Next phred examines each trace to find the centers of the actual, or observed, peaks and the areas of these peaks relative to their neighbors. The peaks are detected independently along each of the four traces so many peaks overlap. A dynamic programming algorithm is used to match the observed peaks detected in the second step with the predicted peak locations found in the first step.

    Phred evaluates the trace surrounding each called base using four or five quality value parameters to quantify the trace quality. It uses a quality value lookup table to assign the corresponding quality value. The quality value is related to the base call error probability by the formula

    QV = - 10 * log_10( P_e )

    where P_e is the probability that the base call is an error.

    Phred uses data from a chemistry parameter file called 'phredpar.dat' in order to identify dye primer data. For dye primer data, phred identifies loop/stem sequence motifs that tend to result in CC and GG merged peak compressions. It reduces the quality values of potential merged peaks and splits those peaks that have certain trace characteristics indicative of merged CC and GG peaks. In addition, the chemistry and dye information are passed to phrap.