pierrebarbera / epa-ng

Massively parallel phylogenetic placement of genetic sequences
GNU Affero General Public License v3.0
77 stars 7 forks source link

Unable to parse model file from IQtree #52

Open stilianoslouca opened 8 months ago

stilianoslouca commented 8 months ago

Hello. EPA-NG fails to parse a model file from the latest IQtree version (option --model), when the selected DNA substitution model is not GTR. The attached IQtree model file, where the selected model is TIM3e+R10, can be used to recreate the following error: libc++abi: terminating due to uncaught exception of type std::invalid_argument: Couldn't parse model file! (can't find 'A-R: '!)

What basically happens is that EPA-NG looks for "GTR" as the DNA substitution model (lines 174-178 in the EPA-NG source code _parsemodel.hpp), and if this is not found, it thinks that it is dealing with an amino-acid model. Consequently, it then looks for the substitution rate between amino acids A & R (line 182 in source file _parsemodel.hpp), which of course does not exist since the model is in fact a DNA substitution model (TIM3e+R10).

My impression was that EPA-NG can handle more DNA substitution models beyond just GTR, however right now this does not seem to be the case (at least not if the model file is from IQtree). Is it possible to fix this issue? It seems that this could be achieved easily using either of the following approaches:

  1. Give the user the option to explicitly specify whether the input model is a DNA or AA substitution model.
  2. Don't automatically switch to AA if the model is not explicitly written "GTR" in the IQtree file, but instead also accept other common specifiers such as "TIM3e" (full list here).

Thank you!

stilianoslouca commented 8 months ago

Here's an example tree model file generated by IQtree, on which EPA-NG fails because the substitution model is not "GTR". tree_model.txt