veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

Questions about BUSTED and an error code 102 #1702

Closed irislin886 closed 2 months ago

irislin886 commented 2 months ago

Hi, when I was trying to run BUSTED on my data using Datamonkey, I ran into an error 102. Does anyone know what is going on? There are 3 species with an outgroup in my data. It worked fine when I selected all. However, when I selected only one species this error popped up. Can someone explain a little bit to me how to select the species/branches correctly? Thank you very much.

Here is the error message: aocc/1.3.0(13):ERROR:102: Tcl command execution failed: conflict aocc

openmpi/gnu/3.1.6(17):ERROR:150: Module 'openmpi/gnu/3.1.6' conflicts with the currently loaded module(s) 'openmpi/gnu/4.1.0' openmpi/gnu/3.1.6(17):ERROR:102: Tcl command execution failed: conflict openmpi

Check errors.log for execution error details.

spond commented 2 months ago

Dear @irislin886,

This is a configuration issue on our end. Do you have a URL for the failed job?

Best, Sergei

@stevenweaver

irislin886 commented 2 months ago

Yes, I do! here is the url: https://www.datamonkey.org/busted/6627f2dacc6c1a3e32b6732c

stevenweaver commented 2 months ago

That message is actually a bit misleading (but should be resolved). The issue is the following, which can be found by clicking the Download Log button:

### Performing the full (dN/dS > 1 allowed) branch-site model fit
Error:
busted.test.omega1 evaluated to a NaN; this can cause all kinds of odd behavior downstream, therefore it is safer to quit now

Function call stack
1 :  [namespace = SloudcTB] Optimize(mles, likelihoodFunction, run_options[utility.getGlobalValue("terms.run_options.optimization_settings")]);

        Keyword arguments:
                {
                 "save-fit":"/dev/null"
                }
-------
2 :  busted.grid_search.results=estimators.FitLF(busted.filter_names,busted.trees,busted.model_map,busted.final_partitioned_mg_results,busted.model_object_map,{terms.run_options.retain_lf_object:TRUE,terms.run_options.proportional_branch_length_scaler:busted.global_scaler_list,terms.run_options.optimization_settings:{"OPTIMIZATION_METHOD":"nedler-mead","MAXIMUM_OPTIMIZATION_ITERATIONS":500,"OPTIMIZATION_PRECISION":busted.nm.precision},terms.search_grid:busted.initial_grid,terms.search_restarts:busted.N.initial_guesses});

        Keyword arguments:
                {
                 "save-fit":"/dev/null"
                }
-------

Best, Steven

stevenweaver commented 2 months ago

The conflict message should no longer be appearing (again, this has no bearing on the actual outcome of the job).

Best, Steven

irislin886 commented 2 months ago

Thank you very much, Steven and Sergei.

However, after reading the error message from the log, I don't quite understand what is happening. Would you mind explaining the actual issue to me and how I can fix it?

Best, Iris

spond commented 2 months ago

Dear @irislin886,

I can't replicate the issue locally, i.e. when I run the analysis using hyphy (develop branch), everything finishes. The error occurs (for the version on Datamonkey) because the background branches have dN/dS = 0. This will be fixed in the next version update.

### Performing the full (dN/dS > 1 allowed) branch-site model fit
* Log(L) =  -736.37, AIC-c =  1545.65 (34 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   19.178    |                                   |
|        Negative selection         |     0.000     |   80.822    |       Collapsed rate class        |
|      Diversifying selection       |     3.578     |    0.000    |       Not supported by data       |

* For *background* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.029     |   49.170    |                                   |
|        Negative selection         |     0.029     |   41.964    |       Collapsed rate class        |
|      Diversifying selection       |     1.294     |    8.866    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.663               |    84.061     |                                   |
|               0.667               |     8.531     |       Collapsed rate class        |
|               5.204               |     7.408     |                                   |

### No evidence for episodic diversifying positive selection under the unconstrained model, skipping constrained model fitting
----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.5000**.

As a general rule of thumb, very small alignments (your has 4 sequences and 130 codons, 0.2 total tree length) often show unstable behavior with more complex models like BUSTED, simply because there's not enough information to fit the model. There's also an extreme imbalance in branch lengths. In the following tree, the labels show the estimated NUMBER of substitutions (syn + non-syn) along each branch. You can see that for most branches its ~1.

image

Personally, for something this small, I'd use a plain old FitMG94 analysis (https://github.com/veg/hyphy-analyses/tree/master/FitMG94)

hyphy ~/Development/hyphy-analyses/FitMG94/FitMG94.bf --alignment ~/Desktop/test.txt --lrt Yes --type local

|            Branch            |     Length     |     dN/dS      |Approximate dN/dS CI|LRT p-value dN != dS|
|:----------------------------:|:--------------:|:--------------:|:------------------:|:------------------:|
|ANA_STRENUUS_G39463_T2_GALB...|     0.002      |     0.000      |   0.000 - 0.535    |       0.0883       |
|ANAX_WALSINGHAMI_G29729_T2_...|     0.002      |     0.000      |   0.000 - 0.533    |       0.0881       |
|            Node3             |     0.008      |     0.000      |   0.000 - 0.174    |       0.1264       |
|ANAX_JUNIUS_G42871_T2_GALBA...|     0.000      |     1.000      |0.000 - 10000.000...|       0.9974       |
|TANYPTERYX_HAGENI_G42552_T2...|     0.274      |     0.112      |   0.076 - 0.161    |       0.0000       |

which shows that only the longest branch TANYPTERYX_HAGENI_G42552_T2... has any evidence for dN/dS ≠ 1 (and even then it's because dN/dS < 1).

Best, Sergei

irislin886 commented 2 months ago

Dear Sergi,

Thank you very much for your help and the explanation. I ran it locally using FitMG94 and it works great. Thank you very much for the suggestions.

Best, Iris