veg / hyphy-analyses

HyPhy standalone analyses
MIT License
36 stars 17 forks source link

BUSTED-MH are running two slowly #38

Open jinglkang opened 1 year ago

jinglkang commented 1 year ago

Dear hyphy community:

I'm a new one to use hyphy and going to use BUSTED-MH to estimate positive selection anaysis.

However, it runs too slowly, is there any ideas to push the running? BTW, my data includes 14 species, and i'm going to identify the genes with nearly 7000 orthologous genes under positive selection. But it seems to take nearly 40 min for a single gene. Could you give me some suggestions to make it faster?

Thanks so much, Kang

spond commented 1 year ago

Dear @jinglkang,

This does seem too long. Could you share one of the files and the command you are using here, so I can benchmark it locally. Will help me determine if it's a code issue (something I can fix, potentially) or your system may just be relatively slow?

Best

jinglkang commented 1 year ago

Hi Spond,

Thanks so much for your response. As you can see from the species tree (spe_hyphy_tre.txt), I hope to detect the positive selected genes of Ldin using busted-mh in orthlogous genes (such as "final_alignment.fa.txt"). And my command is "hyphy BUSTED-MH.bf --alignment final_alignment.fa.txt --tree spe_hyphy_tre.txt --branches Foreground". Is it the correct way to detect positive selected genes by busted-mh? Thanks so much if you point out the problems for the running, and extreamly appreciate if it runs in a correct way and you suggest a way to make it runing more quickly? Thanks so much!

spe_hyphy_tre.txt final_alignment.fa.txt

spond commented 1 year ago

Dear @jinglkang,

Using the current release of HyPhy on an MacBook Pro with an M1 Max processor, the analysis finishes in ~4 minutes. You could be using an outdated version of HyPhy. Also, multiple-hit support has been integrated into the standard busted command, like in the example below.

Can you check what your HyPhy version is (hyphy --version) and also what type of computer system you are running the analysis on?

Best, Sergei

$time hyphy busted --alignment /Users/sergei/Dropbox/Swap/issue-83/final_alignment.fa.txt --tree /Users/sergei/Dropbox/Swap/issue-83/spe_hyphy_tre.txt --multiple-hits Double+Triple --starting-points 5 --branches Foreground

....

### Partition-level rates for multiple-hit substitutions
* rate at which 2 nucleotides are changed instantly within a single codon :   0.1301
* Corresponding fraction of substitutions :  0.000%
* rate at which 3 nucleotides are changed instantly within a single codon :   0.5094
* Corresponding fraction of substitutions :  0.000%

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    8.884    |                                   |
|        Negative selection         |     0.006     |   86.886    |                                   |
|      Diversifying selection       |    249.316    |    4.230    |                                   |

* For *background* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.010     |    0.000    |       Not supported by data       |
|        Negative selection         |     0.018     |   100.000   |                                   |
|        Negative selection         |     0.232     |    0.000    |       Not supported by data       |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.145               |    27.446     |                                   |
|               0.951               |    62.905     |                                   |
|               3.750               |     9.649     |                                   |

### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -7972.43, AIC-c = 16055.59 (55 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

### Partition-level rates for multiple-hit substitutions
* rate at which 2 nucleotides are changed instantly within a single codon :   0.2694
* Corresponding fraction of substitutions :  0.000%
* rate at which 3 nucleotides are changed instantly within a single codon :   0.8239
* Corresponding fraction of substitutions :  0.000%

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   22.611    |                                   |
|        Negative selection         |     0.000     |   48.450    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |   28.939    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.133               |    23.502     |                                   |
|               0.857               |    60.028     |                                   |
|               2.759               |    16.470     |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.0000**.

hyphy busted --alignment  --tree  --multiple-hits Double+Triple  5 --branches  1479.93s user 109.15s system 637% cpu 4:09.21 total

....

The multiple-hits option does increase run time by a factor of ~3 compared to the standard option (BUSTED+SRV).

$time hyphy busted --alignment /Users/sergei/Dropbox/Swap/issue-83/final_alignment.fa.txt --tree /Users/sergei/Dropbox/Swap/issue-83/spe_hyphy_tre.txt --starting-points 5 --branches Foreground 

....

## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.0000**.

hyphy busted --alignment  --tree  --starting-points 5 --branches Foreground  497.08s user 26.93s system 643% cpu 1:21.39 total

Best, Sergei

jinglkang commented 1 year ago

Dear Sergei,

Thanks so much for your reply.

The hyphy version in my own workstation is "HYPHY 2.5.48(MP) for Linux on x86_64", but i ran BUSTED-MH in the university compute clusters, whose hyphy version is "HYPHY 2.5.42(MP) for Linux on x86_64". It might be slower because the hyphy is not the latest version.

Btw, is there any difference between my commond and yours? Or i can use your command for the positive selection analysis? Thanks so much!

Best regards, Jingliang

spond commented 1 year ago

Dear @jinglkang,

There is a big difference between 2.5.42 and 2.5.48 (you would notice that). I would recommend updating to the latest version, and using the commands that I provided as examples.

Best, Sergei

jinglkang commented 1 year ago

Hi Sergei,

Thanks so much for your suggestions, i try running as your command (hyphy busted --alignment paml_input/OG0000065_OG8/final_alignment.fa --tree spe_hyphy.tre --multiple-hits Double+Triple --starting-points 5 --branches Foreground) for the same genes i shared, it takes around 40 minutes. However, it's much faster than the runing by 2.5.42 in the compute clusters (takes almost 2h30min). Will suggest the administrator to update hyphy to the latest version.

Thanks so much!

Jingliang