marbl / Mash

Fast genome and metagenome distance estimation using MinHash
mash.readthedocs.org
Other
389 stars 90 forks source link

Mash Graph Node Representation of Metagenomic data #83

Open JChristopherEllis opened 6 years ago

JChristopherEllis commented 6 years ago

Hi,

Is there a workflow for using MASH to generate a graph node representation chart of metagenomic data as seen in the MASH manuscript (Fig 3)?

Thanks in advance!

ondovb commented 6 years ago

I don't have access to exact code anymore, but the basic idea is to create an "edge file" (\<genome1> \<genome2> \<weight>) with lines for any pairs of genomes with Mash distances below the threshold (0.05 in this case) and to feed it into a graph layout generator (Cytoscape organic layout in this case, but you could use anything). The only tricky part is coloring based on taxID, which you can look up for accessions but sometimes needs to be moved up to species (I used KronaTools for this, naturally!). I believe I then hashed the taxID integer and took the first 6 hex characters to use as a pseudo-random color, or something like that. You can apply the coloring to the graph in Cytoscape by importing a separate lookup table.

JChristopherEllis commented 6 years ago

Thanks Ondovb, is there a good link for creating an edge file with MASH?

Thanks again, Chris