mevers / NCI_ChIP-seq

Systematic optimization of parameters for ChIP-Seq peak calling algorithms using simulated short-read sequencing data
0 stars 1 forks source link

Simulation of actual Illumina read qualities in ChIPSim #1

Open skurscheid opened 7 years ago

skurscheid commented 7 years ago

We could essentially use a perl/python/awk/R script to sample Illumina base call quality strings from existing FASTQ data to build up an empirical distribution of read qualities and use this as the pool to sample random quality scores from...

skurscheid commented 7 years ago

(just using this to keep track of thoughts/ideas for the project, so that it is all in one place)

mevers commented 7 years ago

Contrary to what I had thought, ChIPsim allows to include an error model when simulating reads. Specifically, per-nucleotide read qualities can be simulated by uniform-randomly sampling Phred quality scores. Then ChIPsim uses an error model to perform nucleotide substitutions based on the simulated quality and a set of substitution probabilities. This gives leads to a set of reads with errors plus qualities.