marbl / MetagenomeScope

Visualization tool for (meta)genome assembly graphs
https://marbl.github.io/MetagenomeScope/
GNU General Public License v3.0
24 stars 8 forks source link

Add command-line option (on by default) which discards redundant components #67

Open fedarko opened 7 years ago

fedarko commented 7 years ago

From @fedarko on July 27, 2017 1:45

From the comments of #10: "adding a command-line option to collate.py that, for a GFA/LastGraph/etc file, assumes that contigs are already oriented (and thus doesn't create implied negative nodes/edges). Since MetaCarvel can produce GFA output, implementing something like this would be worthwhile.

(We'd also probably have to adjust the JS code for the viewer interface to -- instead of just treating ASM_FILETYPE === "GML" as the basis for graphs with oriented contigs -- getting some corresponding bool from the db re: the contigs being oriented or unoriented, and operating accordingly.)"

This is actually causing problems with certain GFA files, so need to do this.

Copied from original issue: fedarko/MetagenomeScope#244

fedarko commented 3 years ago

In addition to adding this as an option, I guess, it would be best to add some simple code that attempts it automatically. For example, if all nodes in a GFA graph exist in a separate component from their reverse complement node, then we can just go ahead with assuming the graph is already oriented (and thereby only draw half of the data). However, if this assumption is violated (cough cough that one Velvet E. coli graph), then we can bite the bullet and draw everything.

fedarko commented 2 years ago

We can formalize this and make it less brittle as follows. Given a graph where nodes have reverse-complements (i.e. not MetaCarvel GML), consider all (weakly connected) components. Two components C1 and C2 are complements if they contain the same number of nodes, the same number of edges, and for each node N in C1:

We can add an option named --omit-redundant-components or something. By default (the option is on), if two components C1 and C2 are complements, then we can arbitrarily just draw one of the two components in the visualization interface.

Notes: