Closed Emilyaoc closed 8 months ago
Dear @Emilyaoc,
In principle, yes, MEME can be modified to deal with this type of test, but it's a bit tricky. Because MEME has multiple β rates, you can now have multiple constraints. For example, you could specify that Test
and Reference
branches, have exactly the same distribution:
β-Test = β-Reference β+Test = β+Reference p-Test = p-Reference
You could also specify any subset of these constraints, for example:
β+Test = β+Reference p-Test = p-Reference
Unless your dataset is quite large, this degree of site-level parameterization is probably excessive and may simply fail to find any differences.
I am curious as to why you think that contrast-MEME is needed over contrast-FEL?
Best, Sergei
Dear Sergei, Thank you for getting back to me. My reason for being keen on a contrast-MEME over a constrast-FEL stems from an impression I had that constrast-FEL imposes a restriction that selection doesn't vary across the phylogeny (just sites). Then I supposed that as selection is likely to vary across both sites and branches in reality (?!) that MEME (being a branch-site test) would be more 'realistic' and potentially sensitive (though also more prone to false positives I guess?). But have I perhaps misunderstood? What do you think of the approach of splitting my dataset into two (based on the two groups I expect to be different) and then conducting MEME on each seperately? Emily
Dear Emily,
Well, you could think of contrast-FEL as measuring average effects, whereas contrast-MEME could (in principle) identify what you might call subgroup effects (some branches being different). I would say that the full MEME constraint (either exactly the same or completely different) on two branch sets is probably the way to go. A general implementation might be a bit fiddly, so maybe what we can do is a "one-off" (for me to explore feasibility) -- if you can send me your data and the partitioning and can do a quick-and-dirty experiment.
As far as "splitting" and running separately goes: this is really difficult to interpret. Suppose you find that site X is selected (MEME) in one group or not the other. The biggest confounder that you can't address easily is that you simply lack power to find X in the "negative" group, especially if its smaller. Similarly, if site X is selected in both or neither, ω can still be quite different (just pointing in the same direction).
Best, Sergei
Dear Sergei, I see your point re the difficulty of interpreting the MEME results between the dataset when split. In my case the sample size and the taxonomic spread between the two groups would be very similar. I'd also be intereted in comparing the number of positively selected sites found in each group rather than focusing too much on the identity of the specific sites. Though I'm not sure this solves all the possible problems with interpretation? I guess the best case would be a contrast-MEME. What is the best way for me to send the data to you to see if it's do-able? Thank you for your help! Emily
Dear @Emilyaoc,
Just put the files here: https://www.dropbox.com/request/IqHet1GRFmanzXX9YT9E
I'll close the upload link once they have arrived.
Best, Sergei
Ok, great thank you. Should I upload the tree (with partitioning labels) and one of my gene files? I'd be doing the analysis on hundreds of genes eventually.
Dear @Emilyaoc,
Yes, that would be perfect. Assuming the same tree (or at least that the tree is representative) for the entire gene set.
Best, Sergei
Great, I have done that now. There is a species tree with two sets of labels ("PB" & "CB"), which show the two groups and an example gene file (aligned). Let me know if you need any more info or if I missed anything. Many thanks Emily
Dear @Emilyaoc,
I added an experimental contrast-meme
implementation to the develop
branch. You can use it with
hyphy contrast-meme ...
Best, Sergei
Dear Sergei, Great, thank you. I will give it a go. Do I need to clone the HyPhy repo in order to access the develop branch? Sorry if that's a stupid question. I'm still quite new to working with git repos and I'm not 100% sure if I do the quick version of installing HyPhy ('conda install hyphy') whether I get the option of switching between branches etc. Or perhaps you can only use git on the repository version?! Thank you Emily
Dear @Emilyaoc,
You will need to use git to clone the repository, checkout the develop branch, and install from source.
We're happy to have users using bleeding edge methods, so please let us know if you encounter any issues.
Regards, Stephen
Ok, thank you! I will do that. It may take me a little while to manage it as I am currently having trouble installing a recent enough version of cmake to install HyPhy from source. I think my troubles stem from permission issues with my WSL 2 set up. So it may take me a bit of time to figure this out. But I will get back to you if/when I hit an issue that's relevant to using HyPhy as opposed to just my computer woes.
I've never personally built HyPhy on WSL, but I may now give it a try! I believe previous HyPhy support for Windows utilized cygwin, but this might be dated and require some maintenance.
Hi @stephenshank,
Sorry for a long period of inactivity on this. I now have now cloned the hyphy repo and have given contrast-meme a go. It seems to work fine.
Though I have a question about how to run the command correctly. I tried: 'hyphy contrast-meme --alignment my_alignment.fas --tree my_tree --branch-set CB --comparison PB' I hoped to contrast CB with PB, but I think this just tested CB against everything else grouped into the background (not PB).
Perhaps I have the command line options specified incorrectly as the '--comparison' option is not in the help info, I just used what I would use for BUSTED-PH in the hope it would work in the same way?
Thanks for any help you can offer
Emily
Dear @Emilyaoc,
For contrast-FEL
analyses you would use --branch-set XX
and --branch-set YY
to compare two branch sets against one another, so try that for constrast-meme
.
Let me know if that works!
Best,
Hi @jzehr ,
That works - thank you! Is it possible to get a little help with understanding the output?
1) Does 'prop' refer the proportion of branches where the alternative model (b+) fits best for each group?
2) What does 'subs' refer to for each group? I'm guessing substitutions, but could I get a bit more info on this?
3) Do the P-values & Q-values refer to some comparison between my test groups (in my case 'CB' vs 'PB')? And if so, how are these obtained?
Thank you for your help
Emily
Dear @Emilyaoc,
I am not exactly sure if the definitions for this test match exactly with those from the CFEL test. I am tagging @spond here so that he can properly define/ address your questions.
Best,
Stale issue message
Hello, I was just wondering if there is a way to perform the equivalent to FEL-contrast using MEME? I would like to compare the number (and identity) of positively selected sites between two pre-defined groups of species for a selection of genes. I can do this nicely with FEL-contrast, but would really like to use MEME instead if this is doable. I can split the groups I guess before performing MEME. But I wonder if there's a better way to do this? Thank you for your help Emily