iqtree / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
234 stars 55 forks source link

Segmentation fault with large input matrix #303

Open AryehMiller opened 3 weeks ago

AryehMiller commented 3 weeks ago

Hi! I'm writing with a question about input matrix size limits. I have a large alignment file-- 158 individuals, 49M sites, and I'm getting a segmentation fault error. Do you have any recommendations about to proceed with an error like this? I've omitted the specific gap/ambiguity statistics for the sake of brevity. Any insight is much appreciated, thanks!

I've pasted the relevant parts of the output log file below.

IQ-TREE multicore version 2.3.5 for Linux x86 64-bit built Jul  4 2024
Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt,
Dominik Schrempf, Michael Woodhams, Ly Trong Nhan, Thomas Wong

Host:    halk0008.amarel.rutgers.edu (AVX512, FMA3, 251 GB RAM)
Command: iqtree -s CladeA_SNPs.phy -m GTR+ASC -alrt 1000 -bb 1000 -nt 16 -mem 100G
Seed:    119505 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Mon Aug 12 19:37:52 2024
Kernel:  AVX+FMA - 16 threads (16 CPU cores detected)

Reading alignment file CladeA_SNPs.phy ... Phylip format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 158 sequences with 49417181 columns, 49301010 distinct patterns
25549556 parsimony-informative, 15656488 singleton sites, 8211137 constant sites

...
... [seq composition x2 test details for each specimen in alignment]
...

WARNING: 7 sequences contain more than 50% gaps/ambiguity
****  TOTAL                                  8.42%  143 sequences failed composition chi2 test (p-value<5%; df=3)
NOTE: minimal branch length is reduced to 0.000000002024 for long alignment
ERROR: STACK TRACE FOR DEBUGGING:
ERROR: 1   funcAbort()
ERROR: 2   ()
ERROR: 3   pllLoadAlignment()
ERROR: 4   IQTree::initializePLL(Params&)
ERROR: 5   startTreeReconstruction(Params&, IQTree*&, ModelCheckpoint&)
ERROR: 6   runPhyloAnalysis(Params&, Checkpoint*, IQTree*&, Alignment*&)
ERROR: 7   runPhyloAnalysis(Params&, Checkpoint*)
ERROR: 8   main()
ERROR: 9   __libc_start_main()
ERROR: 10   ()
ERROR: 
ERROR: *** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: CladeA_SNPs.phy.log
ERROR: ***    Alignment files (if possible)
AryehMiller commented 1 week ago

Hi all, just adding here from conversation with @bqminh that with large alignments (e.g., 10 GB+ compressed), it might be worthwhile to automatically append the -t PARS flag to account for the computational demand. Perhaps this could be added in the next update.