stephaneguindon / phyml

PhyML -- Phylogenetic estimation using (Maximum) Likelihood
GNU General Public License v3.0
177 stars 61 forks source link

Bootstrap error with custom model - Can't open file '' #178

Closed adamgicgier closed 1 year ago

adamgicgier commented 2 years ago

Dear Stephane,

I am trying to run custom amino-acid rate model, Qpfam in this case (taken from here http://www.atgc-montpellier.fr/sms/getmatrix.php?matrix=Q.pfam ). I put that file in the src folder. The initial tree building and SPR works fine, but when bootstrap analysis is starting, there is an error: `. Non parametric bootstrap analysis

[ . Can't open file '', enter a new name :`

Here is the standard output . Command line: ./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file modelQpfam -b 10 -f m -v 0 -c 4 -s SPR -o tl --n_rand_starts 2 --rand_start --no_memory_check --run_id Qpfam_test ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.///////////////////////////////////////// / . Sequence filename: MAFFT_test_file.phylip . Data type: aa . Alphabet size: 20 . Sequence format: interleaved . Number of data sets: 1 . Nb of bootstrapped data sets: 10 . Compute approximate likelihood ratio test: no . Model name: Custom (modelQpfam) . Proportion of invariable sites: 0.000000 . RAS model: discrete Gamma . Number of subst. rate catgs: 4 . Gamma distribution parameter: 1.000000 . 'Middle' of each rate class: mean . Amino-acid equilibrium frequencies: model-defined . Optimise tree topology: yes . Starting tree: BioNJ . Add random input tree: yes . Number of random starting trees: 2 . Optimise branch lengths: yes . Minimum length of an edge: 1e-08 . Optimise substitution model parameters: no . Run ID: Qpfam_test . Random seed: 1664109118 . Subtree patterns aliasing: no . Version: 3.3.20220408 . Byte alignment: 32 . AVX enabled: yes . SSE enabled: yes ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.///////////////////////////////////////// / . 818 patterns found (out of a total of 1092 sites). . 377 sites without polymorphism (34.52%). . [Random start 1/ 2] . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies... . This analysis requires at least 11 MB of memory space. . Score of initial tree: -17538.33 . Starting first round of SPRs... 0s | 1 | lnL= -8976.7 | depth= 2/ 40 | improvements= 5 | delta_lnL= 42.0/ 1000.0 + . Second round of optimization... 0s | 2 | lnL= -8948.8 | depth= 3/ 36 | improvements= 6 | delta_lnL= 121.1/ 1000.0 + 1s | 3 | lnL= -8954.6 | depth= 1/ 20 | improvements= 3 | delta_lnL= 0.0/ 1000.0 . Third round of optimization... 1s | 4 | lnL= -8945.4 | depth= 2/ 16 | improvements= 3 | delta_lnL= 2.6/ 1000.0 | triple moves= 5 + 2s | 5 | lnL= -8944.6 | depth= 1/ 12 | improvements= 1 | delta_lnL= 0.0/ 100.0 | triple moves= 5 + 3s | 6 | lnL= -8944.4 | depth= 0/ 9 | improvements= 0 | delta_lnL= 0.0/ 100.0 | triple moves= 5 + . Final optimisation steps... . Log likelihood of the current tree: -8944.316428628018911695108. . [Random start 2/ 2] . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies... . Score of initial tree: -17347.28 . Starting first round of SPRs... 0s | 1 | lnL= -8956.5 | depth= 3/ 40 | improvements= 8 | delta_lnL= 40.4/ 1000.0 + . Second round of optimization... 0s | 2 | lnL= -8955.7 | depth= 1/ 36 | improvements= 4 | delta_lnL= 0.0/ 1000.0 + . Third round of optimization... 1s | 3 | lnL= -8946.1 | depth= 2/ 20 | improvements= 3 | delta_lnL= 2.6/ 1000.0 | triple moves= 5 + 2s | 4 | lnL= -8944.4 | depth= 0/ 16 | improvements= 1 | delta_lnL= 0.0/ 100.0 | triple moves= 5 + 3s | 5 | lnL= -8944.1 | depth= 0/ 12 | improvements= 0 | delta_lnL= 0.0/ 100.0 | triple moves= 5 + . Final optimisation steps... . Log likelihood of the current tree: -8944.047610053614334901795. . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies... . Computing pairwise distances... . Building BioNJ tree... . Score of initial tree: -9045.18 . Starting first round of SPRs... 0s | 1 | lnL= -8961.4 | depth= 1/ 40 | improvements= 5 | delta_lnL= 0.0/ 1000.0 . Second round of optimization... 0s | 2 | lnL= -8950.9 | depth= 1/ 36 | improvements= 4 | delta_lnL= 0.0/ 1000.0 . Third round of optimization... 1s | 3 | lnL= -8944.6 | depth= 1/ 20 | improvements= 3 | delta_lnL= 0.0/ 1000.0 | triple moves= 5 + 2s | 4 | lnL= -8944.4 | depth= 0/ 16 | improvements= 0 | delta_lnL= 0.0/ 100.0 | triple moves= 5 + . Final optimisation steps... . Log likelihood of the current tree: -8944.316479049981353455223. . Launch bootstrap analysis on the most likely tree... . Non parametric bootstrap analysis [ . Can't open file '', enter a new name :

Do you know what could be the reason?

stephaneguindon commented 2 years ago

Hi. I suspect you are using an outdated version of PhyML (I remember fixing this issue earlier this year). Would you mind trying a more recent release? Let me know if you are still stuck.

adamgicgier commented 2 years ago

Dear Stephane,

I am using the version 3.3.20220408, which, as far as I know, is the newest version released?

Best wishes, Adam

stephaneguindon commented 2 years ago

Yes, this is the latest version, which works on my side... Would you mind posting the command-line and the standard output?

adamgicgier commented 2 years ago

Dear Stephane,

There it is:

Command line: ./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file modelQpfam -b 10 -f m -v 0 -c 4 -s SPR -o tl --n_rand_starts 2 --rand_start --no_memory_check --run_id Qpfam_test

////////////////////////////////////.\\\\\\\\\\\ \\\\\\\\\.///////////////////////////////////////// /

. Sequence filename:                             MAFFT_test_file.phylip
. Data type:                                     aa
. Alphabet size:                                 20
. Sequence format:                               interleaved
. Number of data sets:                           1
. Nb of bootstrapped data sets:                  10
. Compute approximate likelihood ratio test:     no
. Model name:                                    Custom (modelQpfam)
. Proportion of invariable sites:                0.000000
. RAS model:                                     discrete Gamma
. Number of subst. rate catgs:                   4
. Gamma distribution parameter:                  1.000000
. 'Middle' of each rate class:                   mean
. Amino-acid equilibrium frequencies:            model-defined
. Optimise tree topology:                        yes
. Starting tree:                                 BioNJ
. Add random input tree:                         yes
. Number of random starting trees:               2
. Optimise branch lengths:                       yes
. Minimum length of an edge:                     1e-08
. Optimise substitution model parameters:        no
. Run ID:                                        Qpfam_test
. Random seed:                                   1664109118
. Subtree patterns aliasing:                     no
. Version:                                       3.3.20220408
. Byte alignment:                                32
. AVX enabled:                                   yes
. SSE enabled:                                   yes

////////////////////////////////////.\\\\\\\\\\\ \\\\\\\\\.///////////////////////////////////////// /

. 818 patterns found (out of a total of 1092 sites).

. 377 sites without polymorphism (34.52%).

. [Random start 1/ 2] . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. This analysis requires at least 11 MB of memory space.

. Score of initial tree: -17538.33

. Starting first round of SPRs...

       0s |   1 | lnL=     -8976.7 | depth=    2/   40 | improvements=                                                 5 | delta_lnL=   42.0/ 1000.0   +

. Second round of optimization...

       0s |   2 | lnL=     -8948.8 | depth=    3/   36 | improvements=                                                 6 | delta_lnL=  121.1/ 1000.0   +
       1s |   3 | lnL=     -8954.6 | depth=    1/   20 | improvements=                                                 3 | delta_lnL=    0.0/ 1000.0

. Third round of optimization...

       1s |   4 | lnL=     -8945.4 | depth=    2/   16 | improvements=   3 | delta_lnL=    2.6/ 1000.0 | triple moves=   5   +
       2s |   5 | lnL=     -8944.6 | depth=    1/   12 | improvements=   1 | delta_lnL=    0.0/  100.0 | triple moves=   5   +
       3s |   6 | lnL=     -8944.4 | depth=    0/    9 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.316428628018911695108.

. [Random start 2/ 2] . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. Score of initial tree: -17347.28

. Starting first round of SPRs...

       0s |   1 | lnL=     -8956.5 | depth=    3/   40 | improvements=   8 | delta_lnL=   40.4/ 1000.0   +

. Second round of optimization...

       0s |   2 | lnL=     -8955.7 | depth=    1/   36 | improvements=   4 | delta_lnL=    0.0/ 1000.0   +

. Third round of optimization...

       1s |   3 | lnL=     -8946.1 | depth=    2/   20 | improvements=   3 | delta_lnL=    2.6/ 1000.0 | triple moves=   5   +
       2s |   4 | lnL=     -8944.4 | depth=    0/   16 | improvements=   1 | delta_lnL=    0.0/  100.0 | triple moves=   5   +
       3s |   5 | lnL=     -8944.1 | depth=    0/   12 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.047610053614334901795. . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. Computing pairwise distances...

. Building BioNJ tree...

. Score of initial tree: -9045.18

. Starting first round of SPRs...

       0s |   1 | lnL=     -8961.4 | depth=    1/   40 | improvements=   5 | delta_lnL=    0.0/ 1000.0

. Second round of optimization...

       0s |   2 | lnL=     -8950.9 | depth=    1/   36 | improvements=   4 | delta_lnL=    0.0/ 1000.0

. Third round of optimization...

       1s |   3 | lnL=     -8944.6 | depth=    1/   20 | improvements=   3 | delta_lnL=    0.0/ 1000.0 | triple moves=   5   +
       2s |   4 | lnL=     -8944.4 | depth=    0/   16 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.316479049981353455223.

. Launch bootstrap analysis on the most likely tree...

. Non parametric bootstrap analysis

[ . Can't open file '', enter a new name :

stephaneguindon commented 2 years ago

You simply need to provide the full path name to the rate matrix file, just like you're doing for sequences.

adamgicgier commented 2 years ago

Dear Stephane,

I have tried that as well, but it didn't work, unfortunately.

Command line: ./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file /cluster/home/agicgier/phylo/models/modelQpfam -b 10 -f m -v 0 -c 4 -s SPR -o tl --n_rand_starts 2 --rand_start --no_memory_check --run_id Qpfam_test

////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

    . Sequence filename:                             MAFFT_test_file.phylip
    . Data type:                                     aa
    . Alphabet size:                                 20
    . Sequence format:                               interleaved
    . Number of data sets:                           1
    . Nb of bootstrapped data sets:                  10
    . Compute approximate likelihood ratio test:     no
    . Model name:                                    Custom (/cluster/home/agicgier/phylo/models/modelQpfam)
    . Proportion of invariable sites:                0.000000
    . RAS model:                                     discrete Gamma
    . Number of subst. rate catgs:                   4
    . Gamma distribution parameter:                  1.000000
    . 'Middle' of each rate class:                   mean
    . Amino-acid equilibrium frequencies:            model-defined
    . Optimise tree topology:                        yes
    . Starting tree:                                 BioNJ
    . Add random input tree:                         yes
    . Number of random starting trees:               2
    . Optimise branch lengths:                       yes
    . Minimum length of an edge:                     1e-08
    . Optimise substitution model parameters:        no
    . Run ID:                                        Qpfam_test
    . Random seed:                                   1664443044
    . Subtree patterns aliasing:                     no
    . Version:                                       3.3.20220408
    . Byte alignment:                                32
    . AVX enabled:                                   yes
    . SSE enabled:                                   yes

////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

. 818 patterns found (out of a total of 1092 sites).

. 377 sites without polymorphism (34.52%).

. [Random start 1/ 2] . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. This analysis requires at least 11 MB of memory space.

. Score of initial tree: -17401.16

. Starting first round of SPRs...

           1s |   1 | lnL=     -8955.7 | depth=    3/   40 | improvements=   8 | delta_lnL=  193.6/ 1000.0   +

. Second round of optimization...

           1s |   2 | lnL=     -8965.3 | depth=    1/   36 | improvements=   6 | delta_lnL=    0.0/ 1000.0
           1s |   3 | lnL=     -8959.5 | depth=    2/   20 | improvements=   5 | delta_lnL=   46.1/ 1000.0

. Third round of optimization...

           2s |   4 | lnL=     -8944.6 | depth=    1/   16 | improvements=   4 | delta_lnL=    0.0/ 1000.0 | triple moves=   5   +
           3s |   5 | lnL=     -8944.3 | depth=    0/   12 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.316469406905525829643.

. [Random start 2/ 2] . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. Score of initial tree: -17377.34

. Starting first round of SPRs...

           0s |   1 | lnL=     -8961.1 | depth=    3/   40 | improvements=   5 | delta_lnL=  830.6/ 1000.0   +

. Second round of optimization...

           0s |   2 | lnL=     -8950.3 | depth=    1/   36 | improvements=   4 | delta_lnL=    0.0/ 1000.0   +

. Third round of optimization...

           1s |   3 | lnL=     -8944.3 | depth=    1/   20 | improvements=   1 | delta_lnL=    0.0/ 1000.0 | triple moves=   5   +
           2s |   4 | lnL=     -8944.1 | depth=    0/   16 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.047405687748323543929. . Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. Computing pairwise distances...

. Building BioNJ tree...

. Score of initial tree: -9045.18

. Starting first round of SPRs...

           0s |   1 | lnL=     -8949.7 | depth=    2/   40 | improvements=   6 | delta_lnL=   61.8/ 1000.0

. Second round of optimization...

           0s |   2 | lnL=     -8946.4 | depth=    2/   36 | improvements=   6 | delta_lnL=   30.0/ 1000.0
           1s |   3 | lnL=     -8951.9 | depth=    1/   20 | improvements=   1 | delta_lnL=    0.0/ 1000.0

. Third round of optimization...

           1s |   4 | lnL=     -8945.0 | depth=    1/   16 | improvements=   3 | delta_lnL=    0.0/ 1000.0 | triple moves=   5   +
           2s |   5 | lnL=     -8944.2 | depth=    0/   12 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.047540751609631115571.

. Launch bootstrap analysis on the most likely tree...

. Non parametric bootstrap analysis

[ . Can't open file '', enter a new name :

I did another try with a wrong name of model specified on purpose, and then the PhyML does not run at all, not even the SPR:

./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file /cluster/home/agicgier/phylo/models/w -b 10 -f m -v 0 -c 4 -s SPR -o tl --n_rand_starts 2 --rand_start --no_memory_check --run_id Qpfam_test

. Can't open file '/cluster/home/agicgier/phylo/models/w', enter a new name :

stephaneguindon commented 2 years ago

It could be that the random starting tree options are conflicting with the custom rate model. Could you try without them?

adamgicgier commented 2 years ago

Dear Stephane,

I have removed the random trees start and it still hasn't worked.

. Command line: ./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file modelQpfam -b 10 -f m -v 0 -c 4 -s SPR -o tl --no_memory_check --run_id Qpfam_test

////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

    . Sequence filename:                             MAFFT_test_file.phylip
    . Data type:                                     aa
    . Alphabet size:                                 20
    . Sequence format:                               interleaved
    . Number of data sets:                           1
    . Nb of bootstrapped data sets:                  10
    . Compute approximate likelihood ratio test:     no
    . Model name:                                    Custom (modelQpfam)
    . Proportion of invariable sites:                0.000000
    . RAS model:                                     discrete Gamma
    . Number of subst. rate catgs:                   4
    . Gamma distribution parameter:                  1.000000
    . 'Middle' of each rate class:                   mean
    . Amino-acid equilibrium frequencies:            model-defined
    . Optimise tree topology:                        yes
    . Starting tree:                                 BioNJ
    . Add random input tree:                         no
    . Optimise branch lengths:                       yes
    . Minimum length of an edge:                     1e-08
    . Optimise substitution model parameters:        no
    . Run ID:                                        Qpfam_test
    . Random seed:                                   1664475148
    . Subtree patterns aliasing:                     no
    . Version:                                       3.3.20220408
    . Byte alignment:                                32
    . AVX enabled:                                   yes
    . SSE enabled:                                   yes

////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

. 818 patterns found (out of a total of 1092 sites).

. 377 sites without polymorphism (34.52%).

. Sum of amino-acid frequencies: 1.000001 . Scaling amino-acid frequencies...

. Computing pairwise distances...

. Building BioNJ tree...

. This analysis requires at least 11 MB of memory space.

. Score of initial tree: -9045.18

. Starting first round of SPRs...

           0s |   1 | lnL=     -8949.7 | depth=    2/   40 | improvements=   5 | delta_lnL=   90.5/ 1000.0

. Second round of optimization...

           0s |   2 | lnL=     -8962.9 | depth=    1/   36 | improvements=   5 | delta_lnL=    0.0/ 1000.0

. Third round of optimization...

           1s |   3 | lnL=     -8946.5 | depth=    2/   20 | improvements=   5 | delta_lnL=    2.9/ 1000.0 | triple moves=   5
           2s |   4 | lnL=     -8945.1 | depth=    2/   16 | improvements=   1 | delta_lnL=    2.4/  100.0 | triple moves=   5   +
           3s |   5 | lnL=     -8944.3 | depth=    0/   12 | improvements=   0 | delta_lnL=    0.0/  100.0 | triple moves=   5   +

. Final optimisation steps...

. Log likelihood of the current tree: -8944.047948697090760106221.

. Launch bootstrap analysis on the most likely tree...

. Non parametric bootstrap analysis

[ . Can't open file '', enter a new name :

I have now run many test, removing different options and it seems that the issue lies in the bootstrap option, as all the commands using different branch support method with other settings unchanged have worked. Eg:

./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file modelQpfam -b -4 -f m -v 0 -c 4 -s SPR -o tl --n_rand_starts 2 --rand_start --no_memory_check --run_id Qpfam_test

./phyml -i /cluster/home/agicgier/phylo/test/MAFFT_test_file.phylip -d aa -m custom --aa_rate_file modelQpfam -b -4 -s SPR -o n

stephaneguindon commented 2 years ago

This option works fine on my side. For instance, the following command runs to completion :

./phyml -i /home/guindon/Downloads/small_proteic -d aa -b 10 -m custom --aa_rate_file=getmatrix.txt

You could perhaps try the very last version of PhyML by cloning the repository. Also, which OS are you running here?