nh13 / DWGSIM

Whole Genome Simulator for Next-Generation Sequencing
GNU General Public License v2.0
92 stars 36 forks source link

`-F` flag confusion #51

Closed multimeric closed 5 years ago

multimeric commented 5 years ago

I'm having trouble getting my head around how the -F flag works on DWGSIM.

-F FLOAT frequency of given mutation to simulate low fequency somatic mutations [0.5000] NB: freqeuncy F refers to the first strand of mutation, therefore mutations on the second strand occour with a frequency of 1-F

This first line makes sense to me: the flag sets the proportion of reads that will have the mutation. This is set at 0.5 by default, presumably to simulate a heterozygote germline variant. If you set it lower than 0.5, it might simulate a somatic variant which makes up some smaller proportion of the population. Is this correct so far?

What I don't understand is the NB. What do you mean by "first strand of mutation"? Are you referring to the homologues here? (or populations, for tumour cells).

Secondly, why do you make the mutation on the second strand occur with a 1-F chance? I would have thought these mutation events should be independent of each other. Assuming a germline sample, if there's a 20% chance of a given mutation being added to the first homologue, then why would I want there be an 80% chance of it occurring in the homologous pair?

nh13 commented 5 years ago

@TMiguelT there's room for confusion here. Think about simulating from a diploid individual. There's an F probability of having the base from the maternal chromosome, and 1-F from the paternal chromosome. Does that make sense?