nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

tree: Users cannot override conflicting default arguments with their own arguments in `--tree-builder-args` #778

Closed huddlej closed 2 years ago

huddlej commented 3 years ago

Current Behavior

@llk578496 notes in issue 619 that they cannot provide all valid IQ-TREE arguments to augur tree through the --tree-builder-args argument.

Also I could not specify the bootstrap value due to the uncompabtible argument -n

The specific error was:

Building a tree via: iqtree2 -ninit 2 -n 2 -me 0.05 -nt 10 -s 20210929_two_five_Tree.renamed.align-delim.fasta -m GTR -czb -B 1000 -n OFF > 20210929_two_five_Tree.renamed.align-delim.iqtree.log Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300

ERROR: Shell exited 2 when running: iqtree2 -ninit 2 -n 2 -me 0.05 -nt 10 -s 20210929_two_five_Tree.renamed.align-delim.fasta -m GTR -czb -B 1000 -n OFF > 20210929_two_five_Tree.renamed.align-delim.iqtree.log Command output was: Ultrafast bootstrap does not work with -fast, -te or -n option

ERROR: TREE BUILDING FAILED Please see the log file for more details: 20210929_two_five_Tree.renamed.align-delim.iqtree.log

The issue is that users would like to provide valid IQ-TREE arguments to augur tree, but the default arguments we use can conflict with those user-provided values.

Expected behavior
Users should be able to provide all valid arguments to IQ-TREE through augur tree.

Possible solution
One possible solution would be for the value of --tree-builder-args to completely replace any IQ-TREE default arguments we use in augur tree. The same logic would need to apply for FastTree and RAxML.

A major issue with this solution is that users don't necessarily know what the original defaults were, so it could be tricky to create the proper --tree-builder-args string that includes the relevant defaults and new arguments.

Another solution is to say we do not support arbitrary IQ-TREE arguments and that users should call IQ-TREE directly instead of using augur tree.

huddlej commented 2 years ago

After talking more with users who have tried to use standalone IQ-TREE instead of augur tree, there is still enough value provided by augur's wrapping of IQ-TREE that we should address this problem in augur.

A specific example of what we would like to avoid is in this IQ-TREE command generated by augur tree when the user provides custom arguments to --tree-builder-args that duplicate arguments in our default command (the -ninit and -n arguments):

iqtree2 -ninit 2 -n 2 -me 0.05 -nt 10 -s results/filtered-delim.fasta -m GTR -ninit 10 -n 4

Augur should handle this by running the following command instead:

iqtree2 -me 0.05 -nt 10 -s results/filtered-delim.fasta -m GTR -ninit 10 -n 4

One potential technical solution that could work for all supported tree builders would be to 1) compose an ordered dictionary of all default arguments, 2) parse key/value pairs from user-provided arguments to --tree-builder-args, and 3) update the dictionary with the user's arguments. This solution could be complicated by the need to provide flags in addition to arguments where there is no explicit key/value pair. We would need to parse these as "flag/True" pairs, perhaps. Another bigger issue with this approach is that we would need to support all of the various ways tree builders define command line arguments. Even if we make this solution work, the user won't have a way to know what the default arguments are that they are overriding. This lack of transparency could lead to unexpected behavior from any of the tree builders when users unknowingly provide arguments that conflict with our defaults.

Another solution that might be more generic and surface the defaults to users would be to move the per-builder default arguments into the default value of --tree-builder-args. When users type augur tree -h, they could see the default arguments for each tree builder. When users provide their own --tree-builder-args, the given values would need to completely replace the original defaults. In the example above, running augur tree -h would print:

iqtree2 -ninit 2 -n 2 -me 0.05 -nt 10 -s <input> -m GTR

When the user runs augur tree with --tree-builder-args "-ninit 10 -n 4", the resulting command would be:

iqtree2 -ninit 10 -n 4 -s <input>

Note that we have to allow augur tree to manage the input to iqtree because we create a copy of the input alignment prior to building the tree. As I write out this example, I also see a conflict between --tree-builder-args and --substitution-model, for IQ-TREE (and --nthreads). We could let Augur handle inputs and substitution models as special cases to the general rule where the --tree-builder-args replace the defaults. We just need this to be clear to users. For example, maybe the augur tree -h command should print something like this instead:

iqtree2 -s <alignment> -m <substitution-model> -nt <nthreads> -ninit 2 -n 2 -me 0.05

Then users could at least see which values will be supplied by Augur and which defaults can be overridden. Maybe we could template the commands more generically like so:

iqtree2 -s <alignment> -m <substitution-model> -nt <nthreads> <tree-builder-args>

such that the default value of --tree-builder-args for IQ-TREE is -ninit 2 -n 2 -me 0.05.

jameshadfield commented 2 years ago

Another solution that might be more generic and surface the defaults to users would be to move the per-builder default arguments into the default value of --tree-builder-args.

This is what I'd like to do.

We could let Augur handle inputs and substitution models as special cases to the general rule where the --tree-builder-args replace the defaults.

Yes - special cases will be unavoidable I think (e.g. the sequence input is technically a "tree builder arg"). I think explaining the behavior of augur tree to users by

iqtree2 -s <alignment> -m <substitution-model> -nt <nthreads> <tree-builder-args>

is really clean. It clashes a little bit with other options, such as -fast (a flag which we convert to tree-builder-args) and in these cases I'd propose erroring along the lines of "you can't run both of these together, add ... to your tree builder args to mimic behavior here".

ammaraziz commented 2 years ago

Hi John and James,

This issue is currently hindering some of my work so I'd like to take it on board if that's okay with everyone?

It should be relatively easy to update augur tree to replace the 'fast' defaults with the user inputed tree args. One issue the documentation for the users but that can be simply changed.

Something I noticed in augur tree is that the three build_* functions are not harmonised. For example, build_iqtree has a list of strings for the -fast options but the other build functions do not. I think alongside the above changes, it's a good opportunity to create consistent tree builder functions where possible.

General overview of the things I plan to do:

  1. Create empty list tree_op to contain the default tree building options (replaces fast_op in build_iqtree)
  2. If tree_builder_args is not supplied, tree_op contains the default tree builder options, else tree_builder_args is chosen.
  3. Update call variable to unpack tree_op
  4. Update augur tree -h as indicated above and add a comment so user understands inputs
  5. Add shorthand -m for --substitution-model because it's really useful short hand!

Thanks

Ammar

jameshadfield commented 2 years ago

All yours @ammaraziz - thanks in advance!

Your approach looks generally good. (2) will be complicated by -fast, but see above for a solution here. (5) should be a separate PR, or at least a separate commit in this PR.

huddlej commented 2 years ago

Hi @ammaraziz, I'm just checking in to see how things are going with this issue. I've run into a similar need to override the defaults with custom arguments, so if you need someone to test your code, let me know. :)

ammaraziz commented 2 years ago

Hi @huddlej,

A few things came up at work so I've had to put this aside. But I think by the end of this week or beginning next week I will have a working prototype.

I'd love for someone else to have a gander on my code and to test it. I will ping you here when it's ready :)

Thanks for your patience!

ammaraziz commented 2 years ago

Hey @jameshadfield,

I have to be honest I don't understand the comments about -fast. I understand that -fast is converted to -ninit 2 -n 2 -me 0.05 for compatibility with older versions of iqtree. But when it comes to in tree builder args, is the worry that -fast will cause issues due to incompatibility with other iqtree arguments? If so I assumed iqtree would handle the error and report the issue. Or would we like this to be caught by augur and present a friendlier error message.

Let me know!

Thanks,

Ammar

huddlej commented 2 years ago

@ammaraziz No worries! I ended up hacking together one possible solution to this issue for a related project where it unavoidably necessary (calculating bootstrap values from IQ-TREE, something that doesn't work with our IQ-TREE defaults).

Would you want to try out my implementation and see if it solves your use case, too?

ammaraziz commented 2 years ago

Hi @huddlej Your implementation was nearly identical to mine! Glad to see I was on the right path. I tested your 3e664c32077feca7d280a13ad7f5a411b2e69a40 commit and it works nicely. This was the command I ran:

...  -substitution-model GTR+I --nthreads 10 --tree-builder-args "-B 1000 -alrt 1000"

worked nicely!

I tried breaking it by adding -fast, this was the error:

Command output was:
    Ultrafast bootstrap (-bb) does not work with -fast option

Which is nice.

Interestingly, these commands work:

...  --substitution-model GTR+I --nthreads 10 --tree-builder-args "-m GTR+F"
...  --substitution-model GTR+I --nthreads 10 --tree-builder-args "-nt 11"

Oddly iqtree does not complain. It seems iqtree takes the last argument given.

Finally broke it with the example for --tree-builder-args:

...  --substitution-model GTR+I --nthreads 10 --tree-builder-args "-czb"

error:

augur tree: error: argument --tree-builder-args: expected one argument

I think this is due to argparse issue with negative values: https://docs.python.org/3.2/library/argparse.html#arguments-containing

Not sure what the optimal solution is but I did notice that adding a space at the end of the supplied value fixes this issue, eg: --tree-builder-args "-czb ". It's a bit hacky but we could append a space to the start/end of the tree-builder-args value as a work around.

I hope this information was useful.

huddlej commented 2 years ago

Excellent! Thank you for the detailed feedback and testing, @ammaraziz!

You're right that it's a little strange how IQ-TREE accepts duplicate input arguments. This behavior appears to be the root of #780, leading to segmentation faults under some conditions. A nicer interface might be for augur to check for these hardcoded flags in the user-provided tree-builder arguments and raise a helpful error message when they exist.

For the broken example in the help, the issue you describe is why the example uses an assignment operator style of argument instead of the space-delimited version. This code works for me, for example:

...  --substitution-model GTR+I --nthreads 10 --tree-builder-args="-czb"

It looks like argparse will never get a fix for this bug, so the assignment style used above is the safest usage for augur tree.

There is now a PR #839 where we can continue this conversation, if you like. I'll try adding the check for hardcoded defaults and post a comment there when it's done.

ammaraziz commented 2 years ago

Hi @huddlej, Glad I could be of help. I will put some comments in the PR.