yukiteruono / pbsim2

PBSIM2: a simulator for long read sequencers with a novel generative model of quality scores
GNU General Public License v2.0
69 stars 15 forks source link

Examples or replication information for nanopore simulation? #13

Closed adamnovak closed 1 year ago

adamnovak commented 1 year ago

The pbsim2 paper gives results for simulated R9.5 nanopore reads generated with pbsim2. I would like to replicate this, but I am struggling to find the necessary information on how exactly pbsim2 was run for the paper.

The pbsim2 repo includes an R95.model file, and the program help suggests running with --difference-ratio 23:31:46 for generating nanopore reads, but that is not enough inputs to actually run the program in HMM mode; it look like at least --length-mean, --length-sd, and --accuracy-mean parameters are also needed (in addition to the reference FASTA).

What values for these parameters were used for the paper? Not the defaults which look only appropriate for PacBio reads, right?

Are these values only applicable to the R95.model and that chemistry? If so, what values for the other parameters are supposed to go with the other model files that are included?

yukiteruono commented 1 year ago

Thank you for your using PBSIM2.

The command to replicate the simulation in the paper are: pbsim --depth 21.599 --difference-ratio 23:31:46 --hmm_model R95.model --length-mean 5650 --length-sd 7247 --accuracy-mean 0.83 R.sphaeroides's genome The parameters are real data values in Table S3 of the paper.

The difference ratios are as follows. These were computed from alignments between the real reads and their reference genomes. S.cerevisiae P4C2: 6:58:36 H.sapiens P5C3: 10:52:38 C.elegans P6C4: 6:54:40 E.coli O127:H6 R94: 20:28:52 R.sphaeroides R95: 23:31:46 E.coli K12 MG1655 R103: 40:26:34

adamnovak commented 1 year ago

Thanks! That is very helpful!