sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
671 stars 185 forks source link

Should I create a FlowDescriptor type? #182

Closed pranavtbhat closed 8 years ago

pranavtbhat commented 8 years ago

I was implementing the Push-Relabel algorithm for maximum flows. The problem is that there are a large number of arrays that need to be passed to each function. For example the relabel function needs 7 arguments.

function relabel!{T<:Number}(
    flow_graph::LightGraphs.DiGraph, 
    u::Int,                               
    capacity_matrix::AbstractArray{T, 2}, 
    flow_matrix::AbstractArray{T,2},   
    height::AbstractArray{Int,1},    
    count::AbstractArray{Int,1},      
    Q::AbstractArray{Int,1}        
    )
    count[height[u]]--;
    height[u] = 2*nv(flow_graph);

    for v in fadj(flow_graph, u)
        if capacity_matrix[u,v] > flow_matrix[u,v]
            height[u] = min(height[u], height[v]+1)
        end
    end
    count[height[u]]++
    unshift!(Q, u)
end

I could create a new FlowDescriptor type, something like

type FlowDescriptor{T<:Number}
    flow_graph::LightGraphs.DiGraph                  
    capacity_matrix::AbstractArray{T, 2}
    flow_matrix::AbstractArray{T,2}   
    height::AbstractArray{Int,1}    
    count::AbstractArray{Int,1}      
    Q::AbstractArray{Int,1}   
end

and pass it to functions. Will doing this have any effect on performance/memory?

jpfairbanks commented 8 years ago

That's a good idea. It should not be a negative impact on performance.

sbromberger commented 8 years ago

Yup. Also have a look at DefaultDistance() (in distance.jl) for any arrays that can take a default value. If there's a method messing for it, we can always add it.

sbromberger commented 8 years ago

One other thought. What are the sizes of these arrays? If they're intended to be analogous to edge distances (that is, nv x nv), then this will obviously limit the utility of the function to smaller graphs.

jpfairbanks commented 8 years ago

flow_matrix and capacity_matrix should have nonzero values only for the edges in g.

pranavtbhat commented 8 years ago

The alternatives to 2-D matrices are dictionaries referenced by edges. I guess we could also use Vectors of length |E|. Lookups will be more expensive though.

sbromberger commented 8 years ago

For the record, I'm not advocating any particular structure here - just sharing my experience that anything that relies on dense arrays to store values related to edges won't be scalable, and using other structures won't be performant. (One of the reasons it's so difficult to store edge weights in graph structures without some serious tradeoffs.)

If you anticipate using flow functions for smallish graphs, then by all means use dense arrays which are more performant.

jpfairbanks commented 8 years ago

I prefer using AbstractArray{T,2} or parameterizing over it. one problem with using Dict{Edge,T} is that there is no chance of leveraging locality. Suppose you need to access all edges incident to v. With an AbstractArray{T,2} there is a chance that the caches will help you out and make this faster than random access. With a Dict{Edge,T}, the memory locations of (v, x) and (v,y) are not necessarily close. If the AbstractArray{T,2} is actually a SparseMatrixCSC you will leverage cache locality.

sbromberger commented 8 years ago

@pranavtbhat - was this taken care of by https://github.com/JuliaGraphs/LightGraphs.jl/pull/201?

pranavtbhat commented 8 years ago

No I didn't actually use this type. It almost doubled the amount of ram consumed and also increased the computation time.

jpfairbanks commented 8 years ago

Just gathering all the arguments into a struct doubled memory and increased runtime? That's weird. Was a copy being made somewhere?

pranavtbhat commented 8 years ago

The constructor may have been maintaining copies of all the arrays. Referencing the type's members could also have introduced delays.

jpfairbanks commented 8 years ago

So you are satisfied with the performance and usability of passing abunch of similar arguments around? Should we close the issue?

pranavtbhat commented 8 years ago

The performance is definitely better. The usability, however, is a problem. The code seems bloated and unnecessarily complex. I considered using global variables, but heard that Julia doesn't optimize well when global variables are involved. But I guess we can close this issue.

jpfairbanks commented 8 years ago

Yeah global variables are definitely not the answer.