yjx1217 / simuG

simuG: a general-purpose genome simulator
MIT License
83 stars 11 forks source link

Simulating heterozygous SNPs? #13

Open alexis-catherine opened 1 year ago

alexis-catherine commented 1 year ago

Does this tool only simulate homozygous SNPs or is there an easy way to specify some proportion as homozygous and some proportion heterozygous?

yjx1217 commented 1 year ago

Hi, Alexis-Catherine,

simuG introduces variants to the haploid version of the specified genome, with which you can further simulate reads on top of the variant-carrying genome output by simuG with any read simulators. So you can simulate two different haploid version of variant-carrying genomes by simuG, with some of variants shared between the two simulated haploid genomes (i.e., homozygous variants) and the other variants differ between the two simulated haploid genomes (i.e., heterozygous variants). Here is a potential protocol:

1) Based on your ref_genome, simulate random variants with simuG to generate sim_genome1 and sim_vcf1. 2) Take the output vcf file sim_vcf1 generated by simuG in step (1) for modification, during which you can control which variants to keep (which will be homozygous variants) and which variants to change (which will be heterozygous variants). Let's call this modified vcf file as sim_vcf2. 3) Simulate specified variants with simuG using sim_vcf2 as the input to generate sim_genome_2. 4) Now you will have sim_genome1 and sim_genome2, which share some homozygous variants relative to the ref_genome but also have some heterozygous variants that segregates between sim_genome1 and sim_genome2.

Hope this helps~

Best, Jia-Xing