nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
171 stars 50 forks source link

Pyjar causing core dump #402

Closed DOH-JDJ0303 closed 6 months ago

DOH-JDJ0303 commented 6 months ago

Hello. Pyjar appears to be causing a core dump. This may be related to this issue. This was run on the cleaned output of Snippy from 104 A. baumannii isolates. Also tried with 16 cores with 32 GB memory and got the same issue. I have not tried with the most recent version due to ongoing errors with Numba.

version: 3.3.1 container: https://depot.galaxyproject.org/singularity/gubbins%3A3.3.1--py39pl5321h3d4b85c_0 command:

run_gubbins.py \
    --threads 8 \
    --prefix 1712002919-Acinetobacter_baumannii-00001 \
    --tree-builder iqtree \
    --custom-model GTR+I+G \
    1712002919-Acinetobacter_baumannii-00001.clean.aln

error:

Warning, you specified a working directory via "-w"
Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file 
it will now try to parse it as FASTA file

Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs" 

This is the RAxML Master Pthread

This is RAxML Worker Pthread Number: 1

This is RAxML Worker Pthread Number: 2

This is RAxML Worker Pthread Number: 3

This is RAxML Worker Pthread Number: 4

This is RAxML Worker Pthread Number: 6

This is RAxML Worker Pthread Number: 5

This is RAxML Worker Pthread Number: 7

This is RAxML version 8.2.12 released by Alexandros Stamatakis on May 2018.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Sarah Lutteropp   (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)
Charlie Taylor    (UF)

Alignment has 10918 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 7.04%

RAxML Model Optimization up to an accuracy of 0.100000 log likelihood units

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML
GAMMA model of rate heterogeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 10918
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR

RAxML was called as follows:

raxmlHPC-PTHREADS-AVX2 -T 8 -p 5357 -safe -m GTRGAMMA -s 1712002919-Acinetobacter_baumannii-00001.clean.aln.snp_sites.aln -n 1712002919-Acinetobacter_baumannii-00001.clean.iteration_1_reconstruction -t /tmp/nxf.XXXXhz9vRE/tmpb9l257vg/1712002919-Acinetobacter_baumannii-00001.clean.iteration_1.tre.rooted -f e -w /tmp/nxf.XXXXhz9vRE/tmpb9l257vg 

WARNING the alpha parameter with a value of 17.190925 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 35.333819 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 74.553227 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 152.413026 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 329.946370 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 738.209886 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 1000.000000 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

WARNING the alpha parameter with a value of 1000.000000 estimated by RAxML for partition number 0 with the name "No Name Provided"
is larger than 10.000000. You should do a model test and confirm that you actually need to incorporate a model of rate heterogeneity!
You can run inferences with a plain substitution model (without rate heterogeneity) by specifyng the CAT model and the "-V" option!

Model parameters (binary file format) written to: /tmp/nxf.XXXXhz9vRE/tmpb9l257vg/RAxML_binaryModelParameters.1712002919-Acinetobacter_baumannii-00001.clean.iteration_1_reconstruction
13.141372 -286014.781208

Overall Time for Tree Evaluation 13.141741
Final GAMMA  likelihood: -286014.781208

Number of free parameters for AIC-TEST(BR-LEN): 216
Number of free parameters for AIC-TEST(NO-BR-LEN): 9

Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 1000.000000
Tree-Length: 2.013965
rate A <-> C: 1.076320
rate A <-> G: 5.662898
rate A <-> T: 1.729112
rate C <-> G: 0.545269
rate C <-> T: 5.712031
rate G <-> T: 1.000000

freq pi(A): 0.261248
freq pi(C): 0.238504
freq pi(G): 0.240712
freq pi(T): 0.259536

Final tree written to:                 /tmp/nxf.XXXXhz9vRE/tmpb9l257vg/RAxML_result.1712002919-Acinetobacter_baumannii-00001.clean.iteration_1_reconstruction
Execution Log File written to:         /tmp/nxf.XXXXhz9vRE/tmpb9l257vg/RAxML_log.1712002919-Acinetobacter_baumannii-00001.clean.iteration_1_reconstruction

--- Gubbins 3.3.1 ---

Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". Nucleic Acids Res. 2015 Feb 18;43(3):e15. doi: 10.1093/nar/gku1196.

Checking dependencies and input files...

Checking input alignment file...

Filtering input alignment...
...done. Run time: 28.72 s

Running Gubbins to detect SNPs...
gubbins /tmp/nxf.XXXXhz9vRE/tmpb9l257vg/1712002919-Acinetobacter_baumannii-00001.clean.aln
...done. Run time: 44.33 s

Entering the main loop.

*** Iteration 1 ***

Constructing the phylogenetic tree with raxmlHPC-PTHREADS-AVX2...
raxmlHPC-PTHREADS-AVX2 -T 8 -p 505 -safe -m GTRGAMMA -f d -p 1 -s /tmp/nxf.XXXXhz9vRE/1712002919-Acinetobacter_baumannii-00001.clean.aln.phylip -n 1712002919-Acinetobacter_baumannii-00001.clean.iteration_1 > /dev/null 2>&1
...done. Run time: 243.41 s

Reconstructing ancestral sequences with pyjar...

Fitting substitution model to tree...
raxmlHPC-PTHREADS-AVX2 -T 8 -p 5357 -safe -m GTRGAMMA -s 1712002919-Acinetobacter_baumannii-00001.clean.aln.snp_sites.aln -n 1712002919-Acinetobacter_baumannii-00001.clean.iteration_1_reconstruction -t /tmp/nxf.XXXXhz9vRE/tmpb9l257vg/1712002919-Acinetobacter_baumannii-00001.clean.iteration_1.tre.rooted -f e -w /tmp/nxf.XXXXhz9vRE/tmpb9l257vg

Running joint ancestral reconstruction with pyjar
.command.sh: line 20:   159 Bus error               (core dumped) run_gubbins.py --threads 8 --prefix 1712002919-Acinetobacter_baumannii-00001 ${method_model} 1712002919-Acinetobacter_baumannii-00001.clean.aln
nickjcroucher commented 6 months ago

Looks like it is trying to fit the model with RAxML, maybe try --model-fitter iqtree?

DOH-JDJ0303 commented 6 months ago

Hi @nickjcroucher. Oops, yep it looks like there was an error in my script that was still allowing RAxML to be called but that does not appear to be the source of the issue. I re-ran everything using IQTREE and I am still getting the same error, which again appears to be an issue with Pyjar.

Note: I removed some of the output because it contained sensitive sample names


IQ-TREE multicore version 2.2.5 COVID-edition for Linux 64-bit built Sep 15 2023
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan.

Host: ip-172-31-7-51.us-west-2.compute.internal (AVX2, FMA3, 31 GB RAM) Command: iqtree -nt 8 -safe -redo -m GTR+G4 -s 1712094029-Acinetobacter_baumannii-00001.clean.aln.snp_sites.aln -t /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.tre.rooted --prefix /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1 -n 0 --mlrate -redo Seed: 793205 (Using SPRNG - Scalable Parallel Random Number Generator) Time: Tue Apr 2 22:02:10 2024 Kernel: Safe AVX+FMA - 8 threads (8 CPU cores detected)

Reading alignment file 1712094029-Acinetobacter_baumannii-00001.clean.aln.snp_sites.aln ... Fasta format detected Reading fasta file: done in 0.093331 secs using 86.53% CPU Alignment most likely contains DNA/RNA sequences Constructing alignment: done in 0.111174 secs using 86.4% CPU Alignment has 105 sequences with 35458 columns, 10918 distinct patterns 22089 parsimony-informative, 13369 singleton sites, 0 constant sites Gap/Ambiguity Composition p-value Analyzing sequences: done in 0.00187732 secs using 172.2% CPU

[ REMOVED SENSITIVE INFO]

**** TOTAL 7.04% 0 sequences failed composition chi2 test (p-value<5%; df=3) Checking for duplicate sequences: done in 0.00864522 secs using 263% CPU Reading input tree file /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.tre.rooted ... rooted tree

NOTE: 148 MB RAM (0 GB) is required! Estimate model parameters (epsilon = 0.010)

  1. Initial log-likelihood: -310223.173
  2. Current log-likelihood: -286629.260
  3. Current log-likelihood: -285995.214
  4. Current log-likelihood: -285990.169
  5. Current log-likelihood: -285990.139 Optimal log-likelihood: -285990.133 Rate parameters: A-C: 1.07271 A-G: 5.63418 A-T: 1.70422 C-G: 0.53721 C-T: 5.68407 G-T: 1.00000 Base frequencies: A: 0.261 C: 0.239 G: 0.241 T: 0.260 Gamma shape alpha: 998.989 Parameters optimization took 5 rounds (4.871 sec) Wrote distance file to... BEST SCORE FOUND : -285990.133 Site rates printed to /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.mlrate Total tree length: 2.011

Total number of iterations: 0 CPU time used for tree search: 0.000 sec (0h:0m:0s) Wall-clock time used for tree search: 0.000 sec (0h:0m:0s) Total CPU time used: 107.623 sec (0h:1m:47s) Total wall-clock time used: 13.924 sec (0h:0m:13s)

Analysis results written to: IQ-TREE report: /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.iqtree Maximum-likelihood tree: /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.treefile Site-specific rates: /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.rate Screen log file: /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.log

Date and Time: Tue Apr 2 22:02:25 2024

--- Gubbins 3.3.1 ---

Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". Nucleic Acids Res. 2015 Feb 18;43(3):e15. doi: 10.1093/nar/gku1196.

Checking dependencies and input files...

Checking input alignment file...

Filtering input alignment... ...done. Run time: 22.07 s

Running Gubbins to detect SNPs... gubbins /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.aln ...done. Run time: 37.67 s

Entering the main loop.

Iteration 1

Constructing the phylogenetic tree with iqtree... iqtree -nt 8 -safe -redo -m GTR+I+G -s /tmp/nxf.XXXXUvAFIB/1712094029-Acinetobacter_baumannii-00001.clean.aln.phylip -pre 1712094029-Acinetobacter_baumannii-00001.clean.iteration_1 -quiet ...done. Run time: 322.68 s

Reconstructing ancestral sequences with pyjar...

Fitting substitution model to tree... iqtree -nt 8 -safe -redo -m GTR+G4 -s 1712094029-Acinetobacter_baumannii-00001.clean.aln.snp_sites.aln -t /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1.tre.rooted --prefix /tmp/nxf.XXXXUvAFIB/tmpgp_7rpzw/1712094029-Acinetobacter_baumannii-00001.clean.iteration_1 -n 0 --mlrate -redo

Running joint ancestral reconstruction with pyjar .command.sh: line 7: 159 Bus error (core dumped) run_gubbins.py --threads 8 --prefix 1712094029-Acinetobacter_baumannii-00001 --model-fitter iqtree --tree-builder iqtree --custom-model GTR+I+G 1712094029-Acinetobacter_baumannii-00001.clean.aln

nickjcroucher commented 6 months ago

The gamma distribution alpha parameter is still rising to almost ~1,000, which makes me suspect there are some hypervariable sites in your alignment that are difficult to include in a joint reconstruction. I would recommend taking a look at the data to check if there's anything odd about your alignment, but if you're confident it is correct, I would switch to a marginal reconstruction with the --mar flag.

DOH-JDJ0303 commented 6 months ago

Thanks for the suggestion. What I failed to mention before was that this is running within Nextflow, which appears to be important because I was able to run Gubbins on the same alignment when used outside of Nextflow. I will circle back with a solution once I find one. Thanks for your help!

nickjcroucher commented 6 months ago

Thanks for letting me know - please reopen if there is an issue on the Gubbins side.

DOH-JDJ0303 commented 4 months ago

Hi @nickjcroucher. I wanted to follow up on this issue. Turns out it was not a Nextflow problem but a containerization problem. It seems pyjar requires more shared memory than what is allocated by Docker/Podman/Singularity by default (I believe all are 64 MB?). Below are examples of how to increase the shared memory size:

The specific resource value needed may change but these worked for my applications.

nickjcroucher commented 4 months ago

Thanks @DOH-JDJ0303, that's very helpful - hope everything's working now!