isaacovercast / easySFS

Effective selection of population size projection for construction of the site frequency spectrum. Convert VCF to dadi/fastsimcoal style SFS for demographic analysis
124 stars 23 forks source link

Mutation rate for iyprad output #37

Closed dylanHco closed 3 years ago

dylanHco commented 3 years ago

Hi Issac,

I know this is not an issue for the easySFS program, and I thank you for creating it for the ease of applying it to dadi. I thought about posting this to ipyrad discussion, but i felt it might be better suited here, but please delete if you think it should not belong here. I will also post it to the dadi help forum, but I feel like answers to this question are not so clear, and I know you have much experience with ipyrad.

A popular option for dadi is trying to calculate time since divergence. In order to do that you must calculate Nref=theta/(4μL), where μ is the mutation rate/generation of the species in question and L is the approximate size of the data set. In the best case scenario you will be using a full genome, and have an idea about how large it is, but in a lot of cases most folks are using some fraction of the genome. The other kicker is most (plant?) species do not have estimates of mutations/generation. I was wondering if there was someway to estimate this from ipyrad data without a reference genome. In the end I think the mutation rate should be scaled to how much data actually is used to estimate the amount of snps.

Any suggestions/ideas? Please feel free to delete as I know this is not an issue with your program, but thought many others might also be struggling with this.

-Dylan

isaacovercast commented 3 years ago

Hi Dylan,

It is not possible to estimate mutation rate from a typical RADSeq dataset, which is most of the reason why this is such a difficult value for people to get a reasonable estimate for. For one thing the mutation rate will vary across the genome, so "average" genome wide mutation rate will be only a rough proxy anyway. The easiest way to handle this is to just use a 'reasonable' mutation rate for whatever organism you're interested in and account for the uncertainty of your estimate to get upper and lower bounds on your divergence time estimates.

You're right this isn't exactly an issue with easySFS, so i'll close this issue, but I hope my comments were helpful.

-isaac

dylanHco commented 3 years ago

Thanks Issac, yes your input is very helpful.

-Dylan