theiagen / public_health_bacterial_genomics

GNU Affero General Public License v3.0
26 stars 14 forks source link

reorder snp matrix #198

Closed sage-wright closed 1 year ago

sage-wright commented 1 year ago

This PR will close #133 by using the list of terminal ends generated by kSNP3 to reorder the SNP matrix produced in snp-dists.

This is accomplished by:

emmadoughty commented 1 year ago

The matrix order looks better for viewing in excel. With a conditional colour scale it looks really pretty too! image

I have noticed that the tip order doesn't match the vertical order of tips in the KSNP3 phylogeny I generated in the same wf, so it's still not possible to view this matrix nicely against the tree. Throwing this into Phandango gave me this below... image

I think the interpretation of this data would be easier if the matrix order matched the tree tip order. Even when viewing in excel only, it's not super obvious which isolates form a possible transmission cluster. Would it be possible to look into issue occurring with the tip order?

emmadoughty commented 1 year ago

This was my mistake, having used the wrong dev branch. The result of my singular test on the correct dev branch looks perfect! I shall test a bit more and make the full review in a later comment

image

sage-wright commented 1 year ago

The latest commits implement the following changes:

Phandango coloring is automatically applied to all column headers in matrices (:c1); these matrix files are now .csv files for easy transfer to phandango,

See https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/workflows/theiagen-validations/kSNP3 for help regarding usage of this task.

Snippy_Tree is found in a different repository so these tasks have not yet been added there.

sage-wright commented 1 year ago

Changes to Snippy_Tree have been made and can be found in this PR. Testing still required.

sage-wright commented 1 year ago

Successful tests:

michellescribner commented 1 year ago

@emmadoughty The ksnp3_ml_tree and ksnp3_nj_tree outputs were added in v1.1.0. They are optional and are generated with a corresponding argument added to the "ksnp3_args" input parameter.

Also, with regard to rooting trees at the midpoint above: if a user included an ancestral genome in their kSNP3 run, could they still root the tree at this sample in later analysis if the output is a midpoint-rooted tree? I just want to make sure that outputting a midpoint-rooted tree won't cause any problems if the user has a different sample they want to use as the root.

emmadoughty commented 1 year ago

Thanks for the clarification about the ml and NJ trees. :)

The midpoint rooting on the tree won't prevent the user rerooting the tree again. Rerooting essentially just reorders the file, regardless of whether this is done programmatically, as in this workflow, or via software like FigTree or iToL. You can reorder that file as many times as you like

sage-wright commented 1 year ago

Great comments, Emma!

Response:

sage-wright commented 1 year ago

Most all of the applied changes requested: