veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

Convergence Issues In FEL and MEME #1687

Closed gykoh closed 2 months ago

gykoh commented 5 months ago

Hello!

I wanted to provide an update with the convergence issues in FEL I encountered. I did talk about it last year in this link here: https://github.com/veg/hyphy/issues/1618

I used HYPHY 2.5.51(MP) for Darwin on arm64.

What I did was run FEL 25 times at p = 0.1 under these conditions below (some changes in conditions I ran compared to what I did last year):

I also ran FEL 25 times at p = 0.1 under the same conditions above except the parametric bootstrap resampling was set to 0.

I found that there was a convergence issue in my 25 FEL runs at 1000 replicates/site (each run had slightly different sites).

For my 25 runs at 0 replicates/site, I had no convergence issue. Each run outputted the same codon sites.

What I have done was found codon sites that showed up in all of my 25 FEL runs at 1000 replicates/site and then compared to the codon sites that showed up in my FEL runs at 0 replicates/site.

I noticed the same issue with MEME so I did the similar procedure of comparing codon sites that showed up in all of my 25 MEME runs at 1000 replicates/site to the codon sites that showed up in my MEME runs at 0 replicates/site. Just like FEL, for all 25 MEME runs at 0 replicates/site, I had no convergence issue. Each run outputted the same codon sites.

MEME at p = 0.1 under these conditions:

Questions

  1. It looks like there are discrepancies between runs at 1000 replicates/site but not at 0 replicates/site for both FEL and MEME. a. What is the recommended approach for obtaining convergence in results for both FEL and MEME? b. Can it work to consider the sites that show up in all of the runs at both 1000 and 0 replicates/site? c. Or is there something else to try in choosing reliable set of sites with strong evidence for dN/dS > 1?

Thank you!

spond commented 5 months ago

Dear @gykoh,

When you specify --resample N two things happen both MEME and FEL will, instead of computing p values based on the asymptotic Χ2 distribution of the test statistic , use a parametric bootstrap distribution with N replicates.

Parametric bootstrap is inherently stochastic, so if you run multiple analyses on the same data, some variation is expected. This variation should be fairly minor, so something like

--resample 1000

Site X run 1 p-value 0.09 Site X run 2 p-value 0.11

is OK, but

Site X run 1 p-value 0.002 Site X run 2 p-value 0.25

is not.

Which one are you seeing?

Best, Sergei

gykoh commented 4 months ago

Dear @spond,

I have the second cases for the two sites below. Below is a list of the p-values for the first 5 runs at site 86 (positive selection) at bootstrap of 1000 replicates/site in FEL at p = 0.1:

At bootstrap of 0 replicate/site in FEL, site 86 (positive selection) had a p-value of 0.0717 for all of my runs.

Below is a list of the p-values for the first 5 runs at site 252 (positive selection) at bootstrap of 1000 replicates/site in FEL at p = 0.1:

At bootstrap of 0 replicate/site in FEL, site 252 (positive selection) had a p-value of 0.0785 for all of my runs.

I see the first case for this site. Below is a list of the p-values for the first 5 runs at site 64 (positive selection) at bootstrap of 1000 replicates/site in FEL:

At bootstrap of 0 replicate/site in FEL, site 64 (positive selection) had a p-value of 0.0237 for all of my runs.

The p-values do not vary significantly in some runs. However, how do I know which sites to focus on after running FEL and MEME? Some sites showed up in a few runs but not in all of my runs. For instance, site 173 (positive selection) showed up in only two out of my twenty-five runs with both obtained p-values at p = 0.0999 in FEL.

How does one decide which sites to focus on, especially if the p-values are varying to a certain extent? What would you recommend finding which sites have strong evidence for positive selection?

Thank you!

spond commented 4 months ago

Dear @gykoh,

This level of resampling variation (with bootstrap) is perfectly normal. Even in cases when some runs are >0.1, these values are probably close to 0.1.

All of these cases are "marginal", meaning that they may or may not be considered significant, depending on your desired sensitivity/specificity tradeoffs.

With no bootstrap enabled, you will have the same p-value because there's resampling (randomness involved).

Can you give me some background on what you are using FEL for?

Best, Sergei

gykoh commented 4 months ago

Dear @spond,

Thank you for the explanation on marginal sites!

We are using FEL to look for high dN/dS in genes that code for some cell membrane proteins and using this in understanding the selective pressures in terms of their functions in cells. Ideally, of course, we are looking for a trade off between low false positives and high true positives. But being confident about the sites we “trust” is more important than having all the true positives.

Thank you for your help!

spond commented 4 months ago

Dear @gykoh,

Two further questions

  1. How many sequences are you using as input?
  2. Have you run MEME on the same data? It's more sensitive for detecting positive selection.

Best, Sergei

gykoh commented 4 months ago

Dear @spond,

We are running 29 species for a gene that is about 318 codons long. We have run both MEME and FEL multiple times. Like FEL, at bootstrap of 0 replicate/site, we had convergence in all of our runs. However, at bootstrap of 1000 replicate/site, not all the runs outputted the same sites.

Question 1: Now, we know that some of these p-value differences between runs are marginal, why are there sites that only show up in all of my runs in ONLY either bootstrap 0 replicates/site or 1000 replicates/site for both MEME and FEL?

Below, I provide some cases I encountered in MEME and FEL with sites that were detected to be under positive selection at p = 0.1:

MEME

Settings We Used To Run Meme:

FEL

Settings We Used To Run FEL:

Question 2: Another question we have is if the intersection of sites found from both FEL and MEME is better (in terms of avoiding false positives) than just running MEME or just FEL? For instance, in all of my runs both at bootstrap 0 and 1000 replicates/site at p = 0.1 under the settings listed above, FEL and MEME both report sites 64 and 70 to be under positive selection.

Thank you!

spond commented 4 months ago

Dear @gykoh,

Sorry for the delay responding. There's actually a bug in the code (several recent versions, which affects the performanceof --resample). Would you mind sending me your dataset (privately, spond at temple dot edu), so I could double-check the fix on your dataset before I release the new version?

Best, Sergei

github-actions[bot] commented 2 months ago

Stale issue message