veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
222 stars 69 forks source link

Question regarding distance calculation for Fst #1765

Open jencs011 opened 1 day ago

jencs011 commented 1 day ago

Hi,

I'm trying to replicate analysis completed back in 2013 using an older version of Hyphy. The analysis I'd like to run would includes estimating distances for Fst using an ML approach under a GTR nucleotide substitution model, estimating all parameters independently for each branch. I'm currently using Hyphy from the command line (interactive mode) and I've chosen the options shown below:

                    +--------------------+
                    |Distance Computation|
                    +--------------------+

    (2):[Full likelihood] Estimate distances using pairwise MLE. More choices but slow.

                    |Data type|
                    +---------+
    (1):[Nucleotide/Protein] Nucleotide or amino-acid (protein).

           | Select a standard model. |
           +--------------------------+

    (GRM):General Reversible Model.Local or global parameters. Possible Rate heterogeneity (and HM spatial correlation).

|Model Options|
                    +-------------+

    (1):[Local] All model parameters are estimated independently for each branch.

When I run this I get the error "The dimension of the equilibrium frequencies vector 'codonFrequencies' (4) doesn't match the number of states in the dataset filter (64) 'twoSpecFilter".

The input data is nucleotide sequences of extracted ORFs of HIV and the codons may not match up with the regular start codons. Is this why I'm getting an error? Can you please tell me which options I should choose to replicate the 2013 analysis mentioned above?

Please let me know if I need to provide any further details. Thanks for your help!

spond commented 1 day ago

Dear @jencs011,

I can confirm that it occurs for me as well, on the first dataset that I tried. Digging deeper, I noticed that at some point in the recent past, a bug was introduced into the F_st which would incorrectly route you down the Codon analysis path for Full likelihood options.

I fixed the issue, and included it in the 2.5.64 release today.

Best, Sergei

jencs011 commented 1 day ago

Great, thank you so much Sergei!