nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

Feature request: Command to convert .json to .nwk #438

Open mvolz opened 4 years ago

mvolz commented 4 years ago

I see the export command allows you to export .nwk as .json. Can this operation be done in reverse (i.e. convert augur v2 .json file to a .nwk file.)?

tsibley commented 4 years ago

@mvolz Thanks for your question. I don't believe such a command exists in Augur. All the information you need to produce a Newick tree is there though, and you could process the JSON file yourself in nearly any language to produce a .nwk.

mvolz commented 4 years ago

Would that be something that could go into this package? Or out of scope?

emmahodcroft commented 4 years ago

@mvolz If you are making the JSON yourself, then there will be an intermediate file created, often in the results folder called tree.nwk. However, it won't have all the annotations of the final JSON tree.

If you can view the JSON in a browser (online, or from a local run), then you can scroll to the bottom of the page, and there is a link to 'Download Data' - you can then download a Tree or TimeTree in Newick format. However, again, they won't be annotated with data like location, etc, but perhaps one of these options helps?

tsibley commented 4 years ago

Would that be something that could go into this package? Or out of scope?

My opinion is that it'd be in scope to include Augur commands for converting Augur/Auspice JSONs to other common formats. I think the dev team has discussed this in the past but never settled on concrete plans.

mvolz commented 4 years ago

If you can view the JSON in a browser (online, or from a local run), then you can scroll to the bottom of the page, and there is a link to 'Download Data' - you can then download a Tree or TimeTree in Newick format. However, again, they won't be annotated with data like location, etc, but perhaps one of these options helps?

Ah yes, that works, thanks!

Apparently there is also this but it has broken dependencies: https://github.com/nextstrain/augur/blob/521930410c7b23de8dba647ce7c3edb33e9d1c68/scripts/json_tree_to_nexus.py

huddlej commented 4 years ago

@mvolz There isn't a command line interface planned for this functionality, but you can accomplish this conversion using existing augur python functions.

from augur.utils import json_to_tree
import Bio.Phylo
import json
import requests

# Download a tree JSON.
tree_url = "http://data.nextstrain.org/flu_seasonal_h3n2_ha_2y.json"
tree_json = json.loads(requests.get(tree_url).text)

# Convert JSON to BioPython tree instance.
tree = json_to_tree(tree_json)

# Write tree to disk as a Newick file.
Bio.Phylo.write(tree, "tree.nwk", "newick")
Zsailer commented 4 years ago

Apologies for the shameless plug here... PhyloPandas might be something Augur (or neighboring projects) might find beneficial.

It's essentially a Pandas DataFrame for phylogenetics. You can quickly convert between sequence/tree formats. You can even merge sequence + tree data in a single dataframe. It provides all the benefits for Pandas' API while offering extra methods useful for phylogenetics. I think Augur's JSON format is a necessary format we can add to phylopandas.

huddlej commented 2 years ago

Just a note that I ended up writing a script to convert Auspice JSON to Newick tree and metadata TSV. We haven't decided where this would live in Augur yet though.