sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
671 stars 185 forks source link

Other types are vertices. #340

Closed TransGirlCodes closed 8 years ago

TransGirlCodes commented 8 years ago

Hi LightGraphs people. I'm a maintainer of the BioJulia org and am building the Phylogenetics module for BioJulia/Bio.jl.

Up until now I've been building the phylogenetic types with a linked-node like structure, but it's proving problematic having one type which contains both the phylogenetic structure, and the relationship information. To this end I'd like to investigate using LightGraph types to represent the tree structure, and some types I define to encode biological and evolutionary information in the trees.

For this end it would be awesome if vertices in LightGraph types could be arbitrary types that could contain information like species name, species divergence times and such that I would define. Does this seem like a reasonable thing the LightGraphs types could do and still function as designed? Say instead of:

type DiGraph
    vertices::UnitRange{Int}
    ne::Int
    fadjlist::Vector{Vector{Int}} # [src]: (dst, dst, dst)
    badjlist::Vector{Vector{Int}} # [dst]: (src, src, src)
end

Where the vertices are Integers, instead they could be parametric T?

Thanks for your time, and patience if I've asked a massively stupid thing.

Best, Ben.

TransGirlCodes commented 8 years ago

I've just found Networks.jl, but saw there was still an issue discussing it's design and how it might involve changes in LightGraphs abstraction?

sbromberger commented 8 years ago

Hi @Ward9250 -

The issue with arbitrary vertices is one of time and space. That is, right now, vertices are represented as an integer range, which means that they take almost no space (ranges are represented by a maximum of 128 bits). Also, indexing into a range is constant time.

If you're looking for metadata that can be attached to edges and vertices, there are a couple of things you might want to check out. First, #336 might be of interest. If that doesn't work, then you may have better luck with the Graphs.jl package. It isn't actively maintained, I don't think, but it does have the capability to define your own vertex data structures. (Of course, this will come with associated performance and memory penalties, but that's computing for ya.)

One way to do this within LightGraphs is to store your vertex information in an (external to LightGraphs) array, and then when you need to access the info, just pull it out of that array. It will be efficient to retrieve it; not so efficient to store it. Edges get a bit more complicated since they're not indexed, but you can use an associative array / map / dict to handle edge information.

Post a response here with your thoughts / add'l questions and someone should be able to give further advice and guidance with respect to LG capabilities.

jpfairbanks commented 8 years ago

I mostly agree with @sbromberger. Here is my perspective:

We have discussed the issue of storing vertex and metadata in the graph structure and outside of it. While some tools try to be flexible and encourage storing everything in the graph structure (eg Networkx) the LightGraphs philosophy is that the data structure should be simple.

We have decided that supporting data in the nodes makes things too hard to optimize for the diverse applications people have. The recommended way to do what you want is to represent the graph structure as a LightGraph.DiGraph and then have the data you want stored in an external data structure such as an Array{T} or Dict{Int, T}. This way you can choose the external data structure best for your use case.

You want to run queries on the graph structure where the information on the vertices becomes important, then you are getting into "Graph Database" territory. The two straightfoward ways to do that with LG at the moment is taking subgraphs and custom iterators.

If the query is a filter, like "give me BFS only taking edges that go between vertices with this value", then you can make the subgraph explicitly and then call our standard BFS routines.

If the query is more complicated, then you will have to write your own iterators. We are happy to help by explaining the relevant interfaces. The file src/traversal.jl is a good place to start on this.

Does this help? Feel free to ask more questions.

TransGirlCodes commented 8 years ago

Thanks guys, I've started a new PR over at Bio.jl https://github.com/BioJulia/Bio.jl/pull/157 which will have my progress. I'm looking at having Phylo assume that for a phylogeny with leaves 1..t, the vertices of the graph that represent them are vertices 1..t. The root vertex for the root is always vertex t + 1, and the internal nodes are everything after t + 1.

sbromberger commented 8 years ago

If you can structure your data so that a 1..t representation is meaningful then it's better suited for LG. No translation table needed.

sbromberger commented 8 years ago

Hey @Ward9250 - are we good here, or is there something else you'd like to discuss?

TransGirlCodes commented 8 years ago

@sbromberger I think we're good, I've made a start on Making Bio.Phylo use LightGraph types, and have come up with some indexing types that were originally designed for Genotype Count Arrays I've been building for the same project, which will help with associating data and so on. Thanks everyone. If I have more questions as I go along, I'll swing by the gitter!

sbromberger commented 8 years ago

Very cool. Feel free to open up other issues here if you're stuck on something LG-related.

arekfu commented 8 years ago

Sorry if I jump in on this closed issue. In my LightGraphs-based projects (depgraph), I implemented a type called LabelledDiGraph, which essentially is a DiGraph + a vector of metadata (strings) for vertices:

type LabelledDiGraph
  graph :: DiGraph
  labels :: Vector{AbstractString}
end

This essentially follows the recommendations above. However, I feel it was non-trivial to make LabelledDiGraph play nicely with functions such as induced_subgraph or condensate. One must make sure to correctly update the indexing of the labels over the vertices. So I had to write methods like

# induced_subgraph method
function induced_subgraph(g::LabelledDiGraph, iter)
  graph′ = induced_subgraph(g.graph, iter)
  g′ = LabelledDiGraph(graph′, g.labels[iter])
end

# strongly_connected_components
function strongly_connected_components(graph::LabelledDiGraph)
  scc = strongly_connected_components(graph.graph)
  scc_labels = condense_labels(graph, scc)
  scc, scc_labels
end

# condensation
function condensation(graph::LabelledDiGraph)
  cond_graph = condensation(graph.graph)
  scc, scc_labels = strongly_connected_components(graph)
  LabelledDiGraph(cond_graph, scc_labels)
end

Would you accept a pull request providing an extra type such as LabelledDiGraph? That would lift the burden of juggling the indices off the shoulders of the user.

CarloLucibello commented 8 years ago

it is highly unlikely that LG will support other graph types, this has been strongly debated in #227

To fill this gap https://github.com/JuliaGraphs/Networks.jl was recently created, with the scope to leverage LightGraphs capabilities on more complex data structures, but it never got out of its embrional phase. We could surely accomodate LabelledDiGraph over there, along any other contribution you can imagine regarding alternative graph types

TransGirlCodes commented 8 years ago

I recently started phylogenetics in Bio.jl branch called phylo_rework, which associates data with graph vertices and edges through a simple Indexing data type which associates Labels with indices, if you wanted to look at that.

On Fri, May 13, 2016 at 10:01 AM, Carlo Lucibello notifications@github.com wrote:

it is highly unlikely that LG will support other graph types, this has been strongly debated in #227 https://github.com/JuliaGraphs/LightGraphs.jl/pull/227

To fill this gap https://github.com/JuliaGraphs/Networks.jl was recently created, with the scope to leverage LightGraphs capabilities on more complex data structures, but it never got out of its embrional phase. We could surely accomodate LabelledDiGraph over there, along any other contribution you can imagine regarding alternative graph types

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/JuliaGraphs/LightGraphs.jl/issues/340#issuecomment-218989415