nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
159 stars 49 forks source link

Buffer overflow during model fitting #409

Open Koen-vdl opened 1 month ago

Koen-vdl commented 1 month ago

Hi Nick,

I'm using Gubbins v3.3.5 as indicated below:

 run_gubbins.py $PREFIX-CLEAN_NoRef.aln   --first-tree-builder iqtree \
                                    --tree-builder iqtree \
                                    --tree-args " -T AUTO" \
                                    --first-tree-args " -T AUTO" \
                                    --best-model \
                                    --iterations 10 \
                                    --threads 62 \
                                    --model-fitter raxml \
                                    --prefix $PREFIX

After the first iteration during model fitting gubbins breaks with the following error:

*** buffer overflow detected ***: terminated
Aborted (core dumped)
Unable to fit model to data

I have not been able to work out why. I'm using raxml instead of iqtree during model fitting as I previously encountered the issue described here.

Any help would be much appreciated.

--- Gubbins 3.3.5 ---

Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". Nucleic Acids Res. 2015 Feb 18;43(3):e15. doi: 10.1093/nar/gku1196.

Checking dependencies and input files...

Checking input alignment file...

Filtering input alignment...
...done. Run time: 152.42 s

Running Gubbins to detect SNPs...
gubbins "/srv/koen_vdl/gubbins/tmp7bo0gglf/20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.aln"
...done. Run time: 277.57 s

Entering the main loop.

*** Iteration 1 ***

Constructing the phylogenetic tree with iqtree...
iqtree -nt 62 -safe -redo -m GTR+G4 -seed 1784  -T AUTO -s /srv/koen_vdl/gubbins/20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.aln.phylip -pre 20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.iteration_1 -quiet
...done. Run time: 688.01 s

Reconstructing ancestral sequences with pyjar...

Fitting substitution model to tree...
raxmlHPC-PTHREADS-AVX2 -T 62 -safe -m GTRGAMMA -p 523 -s 20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.aln.snp_sites.aln -n 20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.iteration_1_reconstruction -t /srv/koen_vdl/gubbins/tmp7bo0gglf/20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.iteration_1.tre.rooted -f e -w /srv/koen_vdl/gubbins/tmp7bo0gglf
*** buffer overflow detected ***: terminated
Aborted (core dumped)
Unable to fit model to data
nickjcroucher commented 1 month ago

How irritating - (1) Does the analysis work if you specify a GTR model? (2) Is there anything unusual about the *.tre.rooted phylogeny (e.g. many near zero-length branches?)

nickjcroucher commented 1 month ago

One other thought - -T AUTO might cause problems if there are concurrent processes on the same machine.

Koen-vdl commented 1 month ago

Thanks for getting back so quickly to me @nickjcroucher. I added --model GTR and obtained the same result.

As you suggested *.tre.rooted indeed contains a lot of near zero branch lengths.

20240515_all_4.3.1.2.1_PW_and_TyphiNEt_mindepth45_HighHeterozyg_Genocheck_YES_REF.full.aln-CLEAN_NoRef.iteration_1.tre.rooted.txt

nickjcroucher commented 1 month ago

Did you remove --bestModel when specifying GTR? (sorry, I should have clarified that). It might be that there isn't sufficient information to fit the more complex model (although that seems unlikely), it might be worth starting with JC to see if that can help identify the issue.

Koen-vdl commented 1 month ago

Hi @Nick, I tried what you suggested (with and without the tree-args) and got the same error

*** buffer overflow detected ***: terminated
Aborted (core dumped)

This is the full command I used.

run_gubbins.py $PREFIX-CLEAN_NoRef.aln  --first-tree-builder iqtree \
                                        --tree-builder iqtree \
                                        --tree-args " -T AUTO" \
                                        --first-tree-args " -T AUTO" \
                                        --model-fitter raxml \
                                        --iterations 10 \
                                        --threads 62 \
                                        --prefix $PREFIX \
                                        --model GTR \
                                        --first-model JC

I could upload the input alignment to my personal cloud space in case you wish to take a closer look:

nickjcroucher commented 1 month ago

Sorry, the alignment file appears to be corrupted (I have tried downloading it twice). Have you tried removing " -T AUTO" (IQTREE2 will still be parallelised using the old -nt flag - I am now updating this)? If that doesn't work, perhaps try using --mar, in case it is the joint reconstruction that is causing the problem?