Open fedarko opened 7 years ago
also need to consider how these edges affect statistics; right now they're all included in the total edge count, but that behavior isn't necessarily set in stone
Another downside of implicit edge IDs (described in the message for commit d7eaf4a above) is that it slightly messes up the ID displayed for edges adjacent to collapsed clusters (since the source/target change). All that's different is the ID displayed in the selected info table is different -- not a big deal, but still the sort of thing that shows that this workaround is an inconvenience and should eventually be reverted.
It's looking like certain GFA files, in particular, can have these defined (I think fastg2gfa can produce them). So supporting this behavior (likely through just the "warning" thing, or maybe just ignoring them unless the user requests the warnings to be output) would be a good option.
We could also output, like, a warnings.txt
file or something that contains all duplicated edges. That'd avoid the costs of printing all those duplicates to stdout.
Anyway, this is kind of a lower priority issue since tools like gfaview can apparently be used to remove redundant edges in files like this. I'd still like to get this addressed, though.
A simple solution for this is to just check (upon parsing any given edge declaration) if the edge in question currently exists in the graph. (Either by checking nodeid2outgoingnodeids
for GFA and FASTG files, or by checking nodeid2obj[src_id].outgoing_nodes
for LastGraph and GML files.) If so, add the edge; if not, ignore it.
Pseudocode for the GFA/FASTG approach:
if src_id in nodeid2outgoingnodeids:
if snk_id in nodeid2outgoingnodeids[src_id]:
continue
Pseudocode for the LastGraph/GML approach:
if nodeid2obj[snk_id] in nodeid2obj[src_id].outgoing_nodes:
continue
For filetypes which have implied edges (LastGraph/sometimes GFA), this also takes care of redundant edges brought about by implication like following --
A -> -B
B -> -A
-- in which the first edge implies the second edge. In this case, we'd just parse the first edge declaration, and then when we'd get to the second one it would already be in the graph, so we could just ignore it.
Yeah, easiest solution is just throwing an error if these are found. no need to support these as far as i can tell
If we want to support showing de Bruijn graphs in the more "conventional" way (e.g. SPAdes graphs with edges actually shown as edges, unlike Bandage where the edges are shown as nodes) then we'll need to cross this bridge eventually. Likely not urgent for now, though, since this shift would likely involve a lot of other changes besides this (e.g. completely different pattern detection algorithms, I think...?).
FWIW AGB does support visualizing these sorts of graphs
From @fedarko on August 4, 2017 1:41
I think they're used in layout but not rendered at present? ==> Yep, just looked over my code and that seems to be the case. Options:
Copied from original issue: fedarko/MetagenomeScope#258