Add "map" of assembly graph's connected components

fedarko commented 7 years ago

From @fedarko on February 25, 2017 2:25

We want to be able to distinguish assemblies with lots of tiny components (e.g. biofilm 2) vs. asms with a few large components (e.g. shakya old, biofilm 1). Basically, ways to represent how "noisy" certain components are.

Copied from original issue: fedarko/MetagenomeScope#157

fedarko commented 7 years ago

I was thinking we could use a Treemap or something similar for this? where each component is a section that is scaled by total number of elements, or something. If we do this using d3 then we could include the functionality for the user to order components by, say, number of bubbles, number of edges, etc.

fedarko commented 7 years ago

Now that we're officially using d3.js (due to #232), displaying a treemap shouldn't be that difficult. (An example of this functionality is given here.)

Once we have treemap display functionality ready, selecting a connected component in the treemap would just draw that particular connected component (I guess we could do standard mode by default, although modifying it to draw SPQR mode connected components instead would also be doable).

Another cool thing here is that we could let the user choose what attributes to use for the "size" of each connected component's corresponding rectangle in the treemap: I'd imagine the default would just be number of contigs (analogous to how the current ordering of connected components by "size" works), but we could also provide number of edges, number of bubbles, etc. as various attributes that would set connected components' size in the treemap.

For the sizing stuff, we could even go nuts with this and let the user design their own metric composed of these attributes (e.g. ccSize = sum(node count, 0.5 * edge count, 2 * bubble count, cyclic chain count) or something like that), in which certain attributes can be weighted more or less. However, just drawing a simple treemap based on node size would definitely be a good starting point here.

fedarko commented 7 years ago

An idea: selecting a cc in the treemap opens up a "component info" dialog or control panel section, and the user can then choose whether or not to draw the cc in question.

fedarko commented 7 years ago

We could also use something like this treemap functionality to show comparisons between multiple genome/metagenome graphs, maybe? Would create a separate issue for that, but if we had a visual representation of two graphs -- for example, the Velvet E. coli graph and the MetaCarvel E. coli graph (both hosted on the demo at present) -- we could have something like a normal treemap view, but with the option to toggle between graph files. That'd show:

That the first component of the Velvet E. coli graph has 436 nodes / 570 edges (and I guess their respective percentages of the total nodes in the graph at ~78% nodes, ~86 percent edges), the next two smaller components have about 4-5% each of nodes/edges in the graph, and so on -- indicating a general overview of the complexity of the components of the graph. (The 50 components in the graph composed of just 1 node/0 edges could be collapsed in the treemap, I guess.)
That the MetaCarvel E. coli graph is comprised of only one component, with 168 nodes and 470 edges. The large amount of edges relative to nodes might indicate that this single component is going to be fairly complex, since there'll be many paths through the nodes in the graph.

Just some scattered thoughts -- should write this out in another issue later.

fedarko commented 6 years ago

Use a hierarchical structure of component -> structural patterns -> contig sizes? We could do lots of preprocessing for this in advance.

fedarko commented 6 years ago

Alternate approach (this is all Dr. Pop's idea): spiral out components in a nautilus shape, sort of akin to a golden spiral. These components could be zoom-in-able, sort of akin to a "Prezi" presentation. They'd be able to be pushed from the center proportionally, based on prior components (starting from the "center" of the spiral)'s sizes.

marbl / MetagenomeScope

Add "map" of assembly graph's connected components #35