Closed sage-wright closed 1 year ago
The matrix order looks better for viewing in excel. With a conditional colour scale it looks really pretty too!
I have noticed that the tip order doesn't match the vertical order of tips in the KSNP3 phylogeny I generated in the same wf, so it's still not possible to view this matrix nicely against the tree. Throwing this into Phandango gave me this below...
I think the interpretation of this data would be easier if the matrix order matched the tree tip order. Even when viewing in excel only, it's not super obvious which isolates form a possible transmission cluster. Would it be possible to look into issue occurring with the tip order?
This was my mistake, having used the wrong dev branch. The result of my singular test on the correct dev branch looks perfect! I shall test a bit more and make the full review in a later comment
The latest commits implement the following changes:
reorder_matrix
task will now output:
Phandango coloring is automatically applied to all column headers in matrices (:c1
); these matrix files are now .csv files for easy transfer to phandango,
summarize_data
task now will digest a space-separated list of column names, parse through those column contents, then output a csv with presence/absence for each item in those columns. There is a Boolean option to turn off Phandango coloring (which will color each column differently). See this for example output: example_summarized_data.csvSee https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Wright_PHBG_Sandbox/workflows/theiagen-validations/kSNP3 for help regarding usage of this task.
reorder_matrix
and summarize_data
tasks have been added to the kSNP3 and Core_Gene_SNP workflowssummarize_data
task has been added to MashTreeSnippy_Tree is found in a different repository so these tasks have not yet been added there.
Changes to Snippy_Tree have been made and can be found in this PR. Testing still required.
Successful tests:
@emmadoughty The ksnp3_ml_tree and ksnp3_nj_tree outputs were added in v1.1.0. They are optional and are generated with a corresponding argument added to the "ksnp3_args" input parameter.
Also, with regard to rooting trees at the midpoint above: if a user included an ancestral genome in their kSNP3 run, could they still root the tree at this sample in later analysis if the output is a midpoint-rooted tree? I just want to make sure that outputting a midpoint-rooted tree won't cause any problems if the user has a different sample they want to use as the root.
Thanks for the clarification about the ml and NJ trees. :)
The midpoint rooting on the tree won't prevent the user rerooting the tree again. Rerooting essentially just reorders the file, regardless of whether this is done programmatically, as in this workflow, or via software like FigTree or iToL. You can reorder that file as many times as you like
Great comments, Emma!
Response:
column_names
is now comma-delimited
perform_data_summary
boolean has been removed and instead just checks for presence of text in column_names
input_table
is required for command-line testing. it has no usage for Terra, but is necessary for development
all unordered matrices and unrooted trees are no longer being returned as output to Terra for kSNP3 or Core_Gene_SNP
when a cell contains "False" it now will be empty
values across column_names should now be grouped together by those column_names
Most all of the applied changes requested:
data_summary_sample_names
have been changed to sample_names
reorder_matrix
task has been condensed to only output the midpoint-rooted matrix and tree and renamed the output files accordinglyreorder_matrix
task in Mashtree: that has been fixedsed 's/_contigs//g' ~{input_tree}
to remove all "_contigs" suffixes
This PR will close #133 by using the list of terminal ends generated by kSNP3 to reorder the SNP matrix produced in snp-dists.
This is accomplished by:
snp_dists
taskreorder_matrix
intask_snp_dists.wdl
) that will, given a newick tree and snp matrix: