neherlab / pangraph

A bioinformatic toolkit to align genome assemblies into pangenome graphs
https://neherlab.github.io/pangraph
MIT License
87 stars 7 forks source link

PanX export fails #52

Closed mmolari closed 1 year ago

mmolari commented 1 year ago

I got an error while exporting a graph in PanX-compatible format with the following command:

pangraph export \
    --export-panX \
    --no-duplications \
    --output-directory coli_export \
    graph.json

The error seems to happen when midpoint-rooting with TreeTools.

LoadError: AssertionError: Issue with time on the branch above midpoint
Stacktrace:
  [1] root_midpoint!(t::TreeTools.Tree{TreeTools.EmptyData}; topological::Bool)
    @ TreeTools ~/.julia/packages/TreeTools/B7XJF/src/methods.jl:625
  [2] #root!#31
    @ ~/.julia/packages/TreeTools/B7XJF/src/methods.jl:591 [inlined]
  [3] (::Main.PanGraph.PanX.var"#13#14"{TreeTools.Tree{TreeTools.EmptyData}})()
    @ Main.PanGraph.PanX ~/Downloads/pangraph-test/pangraph/src/panX.jl:124
  [4] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:511
  [5] with_logger
    @ ./logging.jl:623 [inlined]
  [6] produce_tree(alignment::String, scale::Int64)
    @ Main.PanGraph.PanX ~/Downloads/pangraph-test/pangraph/src/panX.jl:123
  [7] emitblock(block::Main.PanGraph.Graphs.Blocks.Block, root::String, prefix::String, identifier::Main.PanGraph.PanX.var"#11#12"{Dict{Main.PanGraph.Graphs.Nodes.Node, String}}; reduced::Bool)
    @ Main.PanGraph.PanX ~/Downloads/pangraph-test/pangraph/src/panX.jl:159
  [8] emit(G::Main.PanGraph.Graphs.Graph, root::String)
    @ Main.PanGraph.PanX ~/Downloads/pangraph-test/pangraph/src/panX.jl:278
  [9] (::Main.PanGraph.var"#39#44")(args::Vector{String})
    @ Main.PanGraph ~/Downloads/pangraph-test/pangraph/src/export.jl:173
 [10] run(cmd::Main.PanGraph.Commands.Command, args::Vector{String})
    @ Main.PanGraph.Commands ~/Downloads/pangraph-test/pangraph/src/args.jl:182
 [11] main(args::Vector{String})
    @ Main.PanGraph ~/Downloads/pangraph-test/pangraph/src/PanGraph.jl:162
 [12] top-level scope
    @ ~/Downloads/pangraph-test/pangraph/src/PanGraph.jl:177
in expression starting at /home/marco/Downloads/pangraph-test/pangraph/src/PanGraph.jl:1

The tree in question seems to have a large clade:

                               , NZ_CP013034.1#1
  _____________________________|
 |                             | NZ_CP013036.1#1
 |
 |                                                          , NZ_CP017865.1#1
 |                                                          |
 |                                                          | NZ_CP007183.1#1
 |                                                          |
 |                                                          | NZ_CP027638.1#1
 |                                                          |
 |                                                          | NZ_CP023545.1#1
 |                                                          |
 |                                                          | NZ_CP017868.1#1
 |                                                          |
 |                                                          | NZ_CP017878.1#1
 |                                                          |
 |                                                          | NZ_CP017873.1#1
 |                                                          |
 |                                                          | NZ_CP011015.1#1
 |                                                          |
 |                              ____________________________| CP011777.1#1
_|                             |                            |
 |                             |                            | NZ_CP015528.1#1
 |                             |                            |
 |                             |                            | NC_022347.1#1
 |                             |                            |
 |                             |                            | NZ_CP017871.1#1
 |                             |                            |
 |                             |                            | NZ_CP013733.1#1
 |_____________________________|                            |
 |                             |                            | NZ_CP017025.1#1
 |                             |                            |
 |                             |                            | NC_022660.2#1
 |                             |                            |
 |                             |                            | NZ_CP027634.1#1
 |                             |                            |
 |                             |                            | NZ_CP018900.1#1
 |                             |
 |                             |_____________________________ NZ_CP013032.1#1
 |
 |                              _____________________________ NZ_CP027639.1#1
 |_____________________________|
                               |_____________________________ NZ_CP007179.1#1

The newick file in question reads:

((NZ_CP013034.1#1:0.0,NZ_CP013036.1#1:0.0):0.000000005,((NZ_CP017865.1#1:0.0,NZ_CP007183.1#1:0.0,NZ_CP027638.1#1:0.0,NZ_CP023545.1#1:0.0,NZ_CP017868.1#1:0.0,NZ_CP017878.1#1:0.0,NZ_CP017873.1#1:0.0,NZ_CP011015.1#1:0.0,CP011777.1#1:0.0,NZ_CP015528.1#1:0.0,NC_022347.1#1:0.0,NZ_CP017871.1#1:0.0,NZ_CP013733.1#1:0.0,NZ_CP017025.1#1:0.0,NC_022660.2#1:0.0,NZ_CP027634.1#1:0.0,NZ_CP018900.1#1:0.0):0.000000005,NZ_CP013032.1#1:0.000000005)0.745:0.000000005,(NZ_CP027639.1#1:0.000000005,NZ_CP007179.1#1:0.000000005)0.000:0.000000005);

And the error is reproducible if I load this tree and run:

line = open("tree.nwk") do f
    readline(f)
end
tree = parse_newick_string(line)
TreeTools.binarize!(tree)
TreeTools.root!(tree; method=:midpoint)
mmolari commented 1 year ago

This issue is due to an error in TreeTools due to numerical inaccuracies. This has been fixed by @PierreBarrat in TreeTools v0.6.2 (see this commit). To solve this in PanGraph I simply updated TreeTools version.