phac-nml / irida-sistr-results

Compile results generated from SISTR within IRIDA
Apache License 2.0
0 stars 1 forks source link

SISTR cgMLST data output? #8

Open peterk87 opened 6 years ago

peterk87 commented 6 years ago

Would it be possible to output the cgMLST results to a separate CSV/TSV file and/or XLSX worksheet?

You could get really fancy and produce a dendrogram from the cgMLST results, output to newick, output a distance matrix of the different profiles, etc (I have performant code that does all this if you'd like to add these features).

It'd be nice for users to use the cgMLST info along with their SISTR serovar/subspecies predictions to view everything in something like Phandango or Microreact (basically tree + metadata table) or to do their own analyses with the complete suite of information available from SISTR.

apetkau commented 6 years ago

This is a good idea. By cgMLST I assume all the allele types? We have this info so could definitely export it to a separate worksheet/file.

I think it's something that can be worked on for the next release. Maybe along with building a dendrogram. I'd be happy to see your code for this :smile:

peterk87 commented 6 years ago

Here's a potentially useful notebook with some code to generate distance matrices, dendrograms, MSTs, flat clusters from distance matrices: https://gist.github.com/peterk87/b203f62a71d7f4fb273139b219af5e81