psathyrella / selection-metric-comments

0 stars 0 forks source link

LBR question #6

Open scharch opened 4 years ago

scharch commented 4 years ago

In any of your sims, do you calculate LBR for trees with multiple AA changes per branch point? Obviously, that will result in lower resolution (which one of these three "simultaneous" mutations was the one that increased fitness), but I'm wondering if it has any other, perhaps more subtle, impacts that you've noticed.

psathyrella commented 4 years ago

I didn't explicitly test it, or think about it that much, but I'm pretty certain that a lot of the trees have multiple AA changes per branch. The most changes per branch should be in the smallest N/gen and longest obs times, which I think are in zenodo with this path:

for-zenodo/plots/single-metric-plots/carry-cap-vs-n-obs-v1/plots/lbr/per-seq.html

screenshot here:

p

These correspond to the middle row of S3 Figure, but only plotting one metric (LBR) for all N/gen values. So with 3000 generations (maybe 20% nucleotide shm) and only sampling 30 sequences (plus remember adding in the "mrca nodes" for all sampled sequences, to approximate what raxml or whatever would give you for inferred ancestors on real data), I could go make the trees to check, but I'm pretty sure that's many AA changes per branch. So red (500 sampled) is way worse than blue (30 sampled), right? Except these kind of differences are really really hard to interpret, because this is measuring "difference to perfect", and the "perfect" also changes as you change N sampled. Basically I spent a huuuuuge amount of time staring at these kind of plots, and ended up deciding there were usually too many things changing to conclude much from e.g. red being higher than blue. But maybe not here, and maybe you'll think of something I haven't.

I honestly didn't think of the very obvious next step of incorporating AA info into LBR until I was writing the paper, and it would have been really nice to test something simple like change in aa-cdist against LBR for this reason. But I'd been working on this stuff for waaaaaay too long already and needed to just write it up.

This gets at something that I meant to emphasize in the paper, but may not have enough -- LBR's usefulness is entirely dependent on how many mutations per branch you have, i.e. on how many inferred ancestors you have, i.e. on the topology of the parts of the tree that you actually do sample. Here's the bit where I mention this in the paper:

p

I think i don't have the above plot using branch length instead of N steps (I got tired of making both versions for everything because they always just showed the same thing).

scharch commented 4 years ago

Thanks. I don't have any intuition for a lot of this, since they are simulations and metrics I haven't really spent any time with. But everything you say sounds reasonably to me...

psathyrella commented 4 years ago

This is a new experiment (and hopefully other people use it as well!) but I'm thinking maybe leave all the issues open for at least a while, so people can see what's been asked more easily.