graph-genome / graph_summarization

Browser for Graph Genomes built with VG based on Graph Summarization to provide semantic zoom. As a user zooms in on a graph genome, the topology becomes more complex. Provides visualization for variation within a species of plant or animal. Designed to scale up to thousands of specimens and provide useful visualizations.
Other
7 stars 1 forks source link

DAGify of graphs to convert paths into slices #13

Open 6br opened 5 years ago

6br commented 5 years ago

DAGify is the first step to convert paths into slices for graphs that we cannot assume that the graphs are already directed acyclic graph.

6br commented 5 years ago

There are procedures in visualizing graphs:

  1. Retrieve a subgraph from the whole graph by tracing the nodes from the initial node. (I have not implemented yet because I think it is enough to use vg chunk command in MVP, but we need to implement the algorithm in future, whose algorithm is as written by Josiah.)
  2. DAGify nodes to determine the order of nodes from the input. First, select the first path as a reference path, and then other paths are compared to the reference path iteratively to capture the node that shared in multiple paths using a longest common substrings DP. Each node between those common nodes should belong to the same slice. -> This is implemented in #9.
  3. Align the order of paths (For example, path(x,y,z) might be displayed as [y,x,z] by comparing the relevancy of paths), but it has not implemented yet. To compare the relevancy of paths, create a big bloom filter to store the hash(node_id). The hamming distance of bloom filter may be an estimated distance between paths. Considering the similarity distance, we can build "guide-tree" from the similarity ratio to determine how to cluster these paths.