petercombs / dicty

0 stars 1 forks source link

Are there any SNPs that seem to mediate sorting, but not cheating? #7

Open petercombs opened 5 years ago

petercombs commented 5 years ago

I'm thinking of greenbeard genes here. Basically, we would want a SNP where the allele frequency is constant in the stalks and spores, but that has high variability across fruiting bodies. Maybe the best way is to look for SNPs that have either high or low allele frequency (see sketch below).

I'm not sure what's the best way to look for these, since at the moment our coverage is too low, but I would think there's some kind of metric of entropy or something that might do it.

image

petercombs commented 5 years ago

After some more thinking and discussion, I think perhaps something like:

Sum_i (AF_i - 0.5)^2 

Where i is the ith fruiting body, Should almost work. That will grab things in any of the corners. So some kind of term that maximizes points close to the diagonal will also be necessary.

petercombs commented 5 years ago

No, that's not going to work, actually, since if there's a SNP that just has low allele frequency overall, it's going to show up high on the greenbeard scale, but could just be a low frequency SNP. What I actually want is some measure of the balance... More thinking about this to come.

petercombs commented 5 years ago

Okay, now to experiment, but after chatting with Ethan about it over lunch, perhaps:

std(AF_stalk,i * AF_spore,i)

could work. Cheaters would probably be quite low on the scale, since they would fairly consistently have (0 * nonzero). Neutral SNPs that have a uniform distribution would be relatively high, but shouldn't be as high as greenbeards.

petercombs commented 5 years ago

More discussion on this with Hunter. He suggested I look up tests for bimodality, which certainly exist, but all of the ones I've found thus far seem fairly hacky, and very little usable code actually exists (various papers by Hartigan).

I spent a while this morning trying out a KS test against a Beta distribution (with small, equal parameters this gives a reasonable approximation to bimodality), but the problem with this approach is that when there is very little data we accept the null hypothesis of equal distributions and when there is good evidence for bimodality we also accept the null hypothesis.

petercombs commented 5 years ago

Okay, more playing around with the diptest r module suggests that it's promising. However, the best hit in my data at the moment is DDB0232430:3295137_A|G, which has two modes of 0.5 and 1. I suppose I can limit to SNPs that have a mean between, say 0.4 and 0.6, but that starts to become a little ad hoc...