Closed esayyari closed 4 years ago
If this is possible, it would significantly reduce the phylogenetic tree generation time and benefit users who need to use SEPP to generate insertion trees often.
It seems that the placements are stored in the DB. SEPP currently uses guppy for this, and the author recommended using gappa (as guppy is no longer in development). The issue is that gappa and guppy didn't return the same results due to a floating-point rounding issue (highly likely). The next step will be to use gappa to create these big trees. In the future, we might create a native python code to replace gappa/guppy. This tool should be able to handle huge datasets and merge trees from different analyses without the need to first merge placement files and then create the trees from scratch.
I was wondering if we could have access to the placement file that SEPP produces when running deblur. The placement file has information regarding placement uncertainty (for example, look at here https://matsen.github.io/pplacer/generated_rst/guppy_sing.html#guppy-sing). As another application, one can use multiple placement files from different insertion runs, merge them, and create a tree without running SEPP from scratch using the "guppy tog" command. This feature could also be integrated with redbiom, so when we download the biom and sequences, we get a JSON file (or even an insertion tree).
Thanks in advance