veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
201 stars 68 forks source link

MEME equivalent to FEL-contrast? #1613

Closed Emilyaoc closed 8 months ago

Emilyaoc commented 1 year ago

Hello, I was just wondering if there is a way to perform the equivalent to FEL-contrast using MEME? I would like to compare the number (and identity) of positively selected sites between two pre-defined groups of species for a selection of genes. I can do this nicely with FEL-contrast, but would really like to use MEME instead if this is doable. I can split the groups I guess before performing MEME. But I wonder if there's a better way to do this? Thank you for your help Emily

spond commented 1 year ago

Dear @Emilyaoc,

In principle, yes, MEME can be modified to deal with this type of test, but it's a bit tricky. Because MEME has multiple β rates, you can now have multiple constraints. For example, you could specify that Test and Reference branches, have exactly the same distribution:

β-Test = β-Reference β+Test = β+Reference p-Test = p-Reference

You could also specify any subset of these constraints, for example:

β+Test = β+Reference p-Test = p-Reference

Unless your dataset is quite large, this degree of site-level parameterization is probably excessive and may simply fail to find any differences.

I am curious as to why you think that contrast-MEME is needed over contrast-FEL?

Best, Sergei

Emilyaoc commented 1 year ago

Dear Sergei, Thank you for getting back to me. My reason for being keen on a contrast-MEME over a constrast-FEL stems from an impression I had that constrast-FEL imposes a restriction that selection doesn't vary across the phylogeny (just sites). Then I supposed that as selection is likely to vary across both sites and branches in reality (?!) that MEME (being a branch-site test) would be more 'realistic' and potentially sensitive (though also more prone to false positives I guess?). But have I perhaps misunderstood? What do you think of the approach of splitting my dataset into two (based on the two groups I expect to be different) and then conducting MEME on each seperately? Emily

spond commented 1 year ago

Dear Emily,

Well, you could think of contrast-FEL as measuring average effects, whereas contrast-MEME could (in principle) identify what you might call subgroup effects (some branches being different). I would say that the full MEME constraint (either exactly the same or completely different) on two branch sets is probably the way to go. A general implementation might be a bit fiddly, so maybe what we can do is a "one-off" (for me to explore feasibility) -- if you can send me your data and the partitioning and can do a quick-and-dirty experiment.

As far as "splitting" and running separately goes: this is really difficult to interpret. Suppose you find that site X is selected (MEME) in one group or not the other. The biggest confounder that you can't address easily is that you simply lack power to find X in the "negative" group, especially if its smaller. Similarly, if site X is selected in both or neither, ω can still be quite different (just pointing in the same direction).

Best, Sergei

Emilyaoc commented 1 year ago

Dear Sergei, I see your point re the difficulty of interpreting the MEME results between the dataset when split. In my case the sample size and the taxonomic spread between the two groups would be very similar. I'd also be intereted in comparing the number of positively selected sites found in each group rather than focusing too much on the identity of the specific sites. Though I'm not sure this solves all the possible problems with interpretation? I guess the best case would be a contrast-MEME. What is the best way for me to send the data to you to see if it's do-able? Thank you for your help! Emily

spond commented 1 year ago

Dear @Emilyaoc,

Just put the files here: https://www.dropbox.com/request/IqHet1GRFmanzXX9YT9E

I'll close the upload link once they have arrived.

Best, Sergei

Emilyaoc commented 1 year ago

Ok, great thank you. Should I upload the tree (with partitioning labels) and one of my gene files? I'd be doing the analysis on hundreds of genes eventually.

spond commented 1 year ago

Dear @Emilyaoc,

Yes, that would be perfect. Assuming the same tree (or at least that the tree is representative) for the entire gene set.

Best, Sergei

Emilyaoc commented 1 year ago

Great, I have done that now. There is a species tree with two sets of labels ("PB" & "CB"), which show the two groups and an example gene file (aligned). Let me know if you need any more info or if I missed anything. Many thanks Emily

spond commented 1 year ago

Dear @Emilyaoc,

I added an experimental contrast-meme implementation to the develop branch. You can use it with

hyphy contrast-meme ...

Best, Sergei

Emilyaoc commented 1 year ago

Dear Sergei, Great, thank you. I will give it a go. Do I need to clone the HyPhy repo in order to access the develop branch? Sorry if that's a stupid question. I'm still quite new to working with git repos and I'm not 100% sure if I do the quick version of installing HyPhy ('conda install hyphy') whether I get the option of switching between branches etc. Or perhaps you can only use git on the repository version?! Thank you Emily

stephenshank commented 1 year ago

Dear @Emilyaoc,

You will need to use git to clone the repository, checkout the develop branch, and install from source.

We're happy to have users using bleeding edge methods, so please let us know if you encounter any issues.

Regards, Stephen

Emilyaoc commented 1 year ago

Ok, thank you! I will do that. It may take me a little while to manage it as I am currently having trouble installing a recent enough version of cmake to install HyPhy from source. I think my troubles stem from permission issues with my WSL 2 set up. So it may take me a bit of time to figure this out. But I will get back to you if/when I hit an issue that's relevant to using HyPhy as opposed to just my computer woes.

stephenshank commented 1 year ago

I've never personally built HyPhy on WSL, but I may now give it a try! I believe previous HyPhy support for Windows utilized cygwin, but this might be dated and require some maintenance.

Emilyaoc commented 11 months ago

Hi @stephenshank,

Sorry for a long period of inactivity on this. I now have now cloned the hyphy repo and have given contrast-meme a go. It seems to work fine.

Though I have a question about how to run the command correctly. I tried: 'hyphy contrast-meme --alignment my_alignment.fas --tree my_tree --branch-set CB --comparison PB' I hoped to contrast CB with PB, but I think this just tested CB against everything else grouped into the background (not PB).

Perhaps I have the command line options specified incorrectly as the '--comparison' option is not in the help info, I just used what I would use for BUSTED-PH in the hope it would work in the same way?

Thanks for any help you can offer

Emily

jzehr commented 11 months ago

Dear @Emilyaoc,

For contrast-FEL analyses you would use --branch-set XX and --branch-set YY to compare two branch sets against one another, so try that for constrast-meme.

Let me know if that works!

Best,

Emilyaoc commented 11 months ago

Hi @jzehr ,

That works - thank you! Is it possible to get a little help with understanding the output?

1) Does 'prop' refer the proportion of branches where the alternative model (b+) fits best for each group?

2) What does 'subs' refer to for each group? I'm guessing substitutions, but could I get a bit more info on this?

3) Do the P-values & Q-values refer to some comparison between my test groups (in my case 'CB' vs 'PB')? And if so, how are these obtained?

Thank you for your help

Emily

jzehr commented 11 months ago

Dear @Emilyaoc,

I am not exactly sure if the definitions for this test match exactly with those from the CFEL test. I am tagging @spond here so that he can properly define/ address your questions.

Best,

github-actions[bot] commented 9 months ago

Stale issue message