yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

Introducing faux heterozygosity #4

Open casparbein opened 1 year ago

casparbein commented 1 year ago

Hi,

I have been using pbsim3 to simulate HiFi read data and reassemble it to get acquainted with long read assembly. An issue I encountered is the relative cleanness of the simulated data. I used the error model, which introduces random errors in simulated reads based on real PacBio reads, right? Is there also a way to simulate a given degree of heterozygosity with pbsim3? Purely homozygous simulated reads are rather easy to assemble, so simulated reads with a certain degree of faux heterozygosity might be more close to real data.

Thanks in advance!

yukiteruono commented 1 year ago

Thank you for your using PBSIM. PBSIM3 cannot simulate reads from polyploid. To simulate heterozygosity in diploid, you first introduce mutations into the reference genome sequence to generate two haploid genomes. Reads are then generated from each haploid genome using PBSIM3. I use a house-made tool to randomly introduce mutations into the genome, or introduce known varants.