marbl / MetagenomeScope

Visualization tool for (meta)genome assembly graphs
https://marbl.github.io/MetagenomeScope/
GNU General Public License v3.0
24 stars 8 forks source link

Centralize data storage in the Python codebase #204

Open fedarko opened 3 years ago

fedarko commented 3 years ago

This is one of those things I skipped over while trying to get the new version out, but it'd be really good to add in when I get some time. With #202 rearing its head, now might be a good time to sort this junk out.

Briefly, the current way we store data in Python (both biological metadata, e.g. length/coverage/GC content, and internal stuff like layout positions) for nodes/edges is ... "ad hoc", to put it politely. I think node data is currently stored in the AssemblyGraph.digraph graph (in the data dictionary), and edge data is usually stored in the AssemblyGraph.decomposed_digraph graph (and then distributed throughout the subgraphs of each Pattern, as needed).

We should really try to set things up so the AssemblyGraph holds this data directly (or maybe have another class hold it, idk): this would simplify the process of setting and getting data alike, and would make pattern detection, layout, rotation/scaling, and data exporting insanely easier.


Necessary things for this system

Nice-to-haves