Open NatJWalker-Hale opened 3 years ago
I just had one question about the interpretation and transformation of the values in the *.meandiffsel file. I assume that the raw values in columns 1 to 21 (zero-indexed) are the per-site differential selection effects for each amino acid. Is this correct?
yes, range(1,21)
produces the list from 1 to 21 (excluded), which indeed correspond here to posterior probabilities related to the differential selective effects. The code that produces those numbers is here.
Basically, the posterior probabilities that are produced correspond to the probability of finding a selective effect for a given amino acid at a particular site which is greater than the mean over all amino acids at this site. We'd like to detect cases when this number is close to 1 or close to 0, so the transformation 2 * abs(0.5 - D)
maps those posterior probabilities to [0;1] s.t. values close to 0.5 are mapped to 0, and values close to 0 or 1 end up close to 1. Does it make sense?
Also I'd like to advertise that we currently have promising results with a reimplementation of tdg09, which seems to perform as well as diffsel on simulations, but runs in much less time. Don't hesitate to get in touch if you're interested!
Hi @pveber,
Thanks so much for your response, that makes perfect sense!
Re the reimplementation, I would love to hear more. I'm currently using many of the approaches in the Phil Trans review, including tdg09, as well as some additional methods to explore convergence in one of my datasets. It would be fantastic to have more options.
Thanks again for the help,
Best,
Nathanael
Hi @vlanore,
I just had one question about the interpretation and transformation of the values in the *.meandiffsel file. I assume that the raw values in columns 1 to 21 (zero-indexed) are the per-site differential selection effects for each amino acid. Is this correct?
In the GitLab repository for the Phil Trans review (in lib/scripts/diffsel_analyze_result.py), @pveber has the following transformation to calculate the per-amino acid convergence probability:
2 abs(0.5 - D) where D is the element for for that amino acid at that site in .meandiffsel (in his script, it is vectorised).
Forgive me if this is very stupid, but what exactly is this calculation doing? How do we go from the elements in *.meandiffsel to the posterior probabilities?
Thanks so much in advance for your help,
Best,
Nathanael