iqtree / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
235 stars 56 forks source link

Concordance Analysis Bug - Large Number of Partitions? #316

Open jasongallant opened 1 week ago

jasongallant commented 1 week ago

Hi There,

I'm trying to follow along this tutorial with my own data (http://iqtree.org/doc/recipes/concordance-vector)

I'm currently using the latest release (IQ-TREE multicore version 2.3.6 for Linux x86 64-bit built Aug 1 2024). I can successfully run this command:

iqtree2 -te astral_species_annotated.tree -p loci.best_model.nex --scfl 100 --prefix scfl -T 128

This example dataset contains 400 genes from a variety of bird species.

I'm trying to do something similar with about 25k genes. When I run this with the full dataset:

iqtree2 -te my_astral_species_annotated.tree -p my_loci.best_model.nex --scfl 100 --prefix scfl -T 128

I get this error:

Reading partition model file my_loci.best_model.nex ... Reading "SETS" block... terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits, std::allocator >' ERROR: STACK TRACE FOR DEBUGGING: ERROR: ERROR: IQ-TREE CRASHES WITH SIGNAL ABORTED ERROR: For bug report please send to developers: ERROR: Log file: loci.best_model.repaired.nex.log ERROR: Alignment files (if possible) Aborted

However, If I manually edit the my_loci.best_model.nex to only include the first 10 genes, iqtree2 runs without issue. This causes me to suspect that this is related to the large number of partitions, however the program crashes nearly instantly. I'm running attempting this run on a machine with 128 processors and 2TB of RAM.

Any suggestions how to fix or proceed with this? Many thanks in advance!

jasongallant commented 1 week ago

I wrote a little python script that subsets the my_loci.best_model.nex randomly-- looks like somewhere between 200-400 sequences is the limit before it crashes?

jasongallant commented 1 week ago

For what its worth, this is the same type of analysis attempted in #155

roblanf commented 2 days ago

@thomaskf and @bqminh any ideas here?

@jasongallant, one option you could try is to use --scf instead. I appreciate this is not the same, but it might get you some useful information and/or help us track down the bug

jasongallant commented 2 days ago

Hi @roblanf - thanks for the reply, working with scf right now-- I also noted another issue #223 that affects tree calculations (noticed by @simone-says originally) in scfl. It has made the going tough, but it looks like scf is the way forward until this gets ironed out. let me know if I can provide more info on this end.

roblanf commented 1 day ago

Thanks for the cross-linking! As on the other thread, the most useful thing is a reproducible example if you have one, then as soon as one of us has time we can get straight to debugging.