sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
673 stars 185 forks source link

discussion: API changes #665

Closed sbromberger closed 6 years ago

sbromberger commented 7 years ago

Proposal:

This would simplify the API to the following functions:

All changes would move through deprecation for at least one minor version.

cc @jpfairbanks @juliohm

jpfairbanks commented 7 years ago

This came out of a discussion at JuliaCon about what if any difference there is between fadj/badj and in_neighbors/out_neighbors. Seth proposes that fadj/badj is an iterable of the edges that leave/enter the vertex, while in_neighbors/out_neighbors is an iterable over the set of neighbors which ignores the multiplicity in a multigraph.

Since we have never supported Multigraphs, this semantic debate was never an issue. Almost none of our algorithms would correctly handle multigraphs if we just dropped them in without modifying the algorithms.

Does anyone have strong opinions about these names? /bikeshed.

I agree that we should have different names for these concepts. A quick google search shows that "Forward Adjacencies" has been used in RFC4655. The phrase appears once in this paragraph:

There is benefit in knowing which TE LSPs exist, and their routing, to support such applications as placing a high-priority TE LSP in a crowded network such that it preempts as few other TE LSPs as possible (also known as the "minimal perturbation" problem). Note that preempting based on the minimum number of links might not result in the smallest number of TE LSPs being disrupted. Another application concerns the construction and maintenance of a Virtual Network Topology [MLN]. It is also helpful to understand which other TE LSPs exist in the network in order to decide how to manage the forward adjacencies that exist or need to be set up. The cost- benefit of stateful PCE computation would be helpful to determine if the benefit in path computation is sufficient to offset the additional drain on the network and computational resources.

And in this comment in BGL's implementation of Parallel Brandes' Algorithm:

// Mark forward adjacencies in DAG of shortest paths

I think that neighbors is the more fundamental concept and the set of outgoing edges in a multigraph is the secondary concept with a representation that depends on the implementation details of the multigraph.

sbromberger commented 7 years ago

So actually, I just took a look at the code, and I realize that I misspoke yesterday. fadj and badj are not in interface.jl. They're not even exported from simplegraphs, just used as accessors. Therefore, the correct contract is already with in_neighbors and out_neighbors.

So I'll check off that first idea as "already done" - there is no need for every graph type to support fadj and badj. Just out_neighbors and in_neighbors, but with the caveat that they should not be altered (they should be treated as immutable).

I would still appreciate some discussion on the remaining issues.

nickeubank commented 7 years ago

First, a propos #642, what exactly is distinction between the API and core? I assume by API you mean "if you make a custom graph Type, it must offer those functions". Does that imply that in moving into the core, we moving these functions inside SimpleGraphs, and people making their own Types don't have to offer those functions?

(If I follow that right, then) I think those basic modifications are so core to network analysis they belong in the API. I suppose one can measure descriptives without them, but things like clique percolation, robustness measures, and things like Louvain algorithm all require graph modifications. Never mind just building up graphs piecemeal.

jpfairbanks commented 7 years ago

Yeah the idea is that the interface is the minimal amount of functionality your type must provide in order to get the rest of the library functions to work on your type. There can be additional functions that you have to get in order to achieve optimal performance, but we don't really have a concrete plan for that yet.

sbromberger commented 7 years ago

To elaborate a bit further, this is how I think about it as a three-tiered system:

API - stuff that must be implemented in order for basic functionality tests to pass - these functions are the building blocks for everything else. These include accessors for the graph structures.

Core - functions that should be implemented (and/or optimized) in order to enable more complex features. Things like indegree and outdegree are included here. These functions typically use API directly as their building blocks.

Everything else - functions that use core or API to do some other stuff, and that don't necessarily serve as building blocks themselves. Examples would be the centrality measures, and a counter-example to the latter criterion would be dijkstra_shortest_paths.

jpfairbanks commented 7 years ago

Yes the Core functionality should be stuff that is defined in terms of the API but might need to be reimplemented by a type to improve performance. and the Everything else should all be completely oblivious to the fact that there are multiple types of graphs underneath.

So things like merge would be Core because it is not necessary that every graph type support efficient merging of vertices, but community detection is in the Everything else category building on Core operations. If your graph type doesn't have an efficient merge, then it won't have an efficient community detection either, but that is ok, we need to either find a way to implement merge efficiently or use a different data structure.

sbromberger commented 7 years ago

So things like merge would be Core

I agree with this (and realize it's a change from my original position of having this in operators).

nickeubank commented 7 years ago

@sbromberger that's a very helpful summary, thanks. The README.md has a single category called "Core API" so I was getting a little confused.

In that case, my inclination is to still have the constructors/modifiers as API, but I understand if they get shifted to core.

sbromberger commented 7 years ago

The key for me is this: if we make it part of the API, then we're saying that every single graph type must implement this function. This includes immutable graphs and ones that might be backended by an external datastore. Immutable graphs can't / won't have any modifiers by definition, which is why I think these belong in core.

You're 100% right about the README being ambiguous. We need to fix that.

nickeubank commented 7 years ago

Yup makes sense. Hadn't thought about immutable graphs (especially backed by a database or something) as a thing someone would want to implement. On Tue, Jul 11, 2017 at 7:58 PM Seth Bromberger notifications@github.com wrote:

The key for me is this: if we make it part of the API, then we're saying that every single graph type must implement this function. This includes immutable graphs and ones that might be backended by an external datastore. Immutable graphs can't / won't have any modifiers by definition, which is why I think these belong in core.

You're 100% right about the README being ambiguous. We need to fix that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaGraphs/LightGraphs.jl/issues/665#issuecomment-314622429, or mute the thread https://github.com/notifications/unsubscribe-auth/AJPC7TBeLUfjXaUeodEPjkVERZVlA7v1ks5sNCg2gaJpZM4ODumG .

sbromberger commented 6 years ago

Coming back to this. I don't think I like in_degree replacing indegree since we have indegree_centrality and in_degree_centrality seems awkward.

I think I'd prefer renaming in_neighbors and out_neighbors to inneighbors and outneighbors for consistency, or maybe just making neighbors(...; dir=:in) instead. (Though this might have performance implications.)

@jpfairbanks - thoughts?

jpfairbanks commented 6 years ago

I would prefer neighbors(g, v, :in) as long as it is fast enough. inneighbors has 1 too many ns and a keyword arg is a pain. For a real C style api neighbors(g, v, DIR_IN) neighbors(g, v, DIR_OUT)

sbromberger commented 6 years ago

Here's my latest idea (hat tip @ararslan):

neighbors(g, v, ::Val{:in}) = ...
neighbors(g, v, ::Val{:out}) = ...
neighbors(g, v, ::Val{:both}) = ...
neighbors(g, v, d::Symbol=:out) = neighbors(g, v, Val(d))

similarly,

degree(g, v, ::Val{:in}) = ...
degree(g, v, ::Val{:out}) = ...
degree(g, v, ::Val{:both}) = ...
degree(g, v, d::Symbol=:out) = degree(g, v, Val(d))

@jpfairbanks - thoughts?

(see also https://github.com/JuliaLang/julia/issues/25715)

jpfairbanks commented 6 years ago

I like it!

matbesancon commented 6 years ago

other possibility is using empty structs

abstract type DegreeDirection end
struct InDegree <: DegreeDirection end
...
degree(g, v, ::InDegree) = ...
...

This would come to the same design as maximum_flow which takes a FlowAlgorithm argument I think the compile-time dispatch is still guaranteed. Which do you think would fit best for these?

sbromberger commented 6 years ago

This would come to the same design as maximum_flow which takes a FlowAlgorithm argument

Yes. The history behind using structs for dispatch in LightGraphs is twofold: first, Graphs.jl used them, and a lot of our code came from that package. second, Val wasn't an option.

On the flip side: where we're using empty structs, we should evaluate whether Val could be used instead. GraphIO.jl could benefit.

sbromberger commented 6 years ago

degree(g, v, ::InDegree) = ...

This bothers me, because it's redundant, and we'd need to have InNeighbors, and In* for other things.

jpfairbanks commented 6 years ago

This is a problem for all functions that currently take a neighborfunction those functions currently would require you to make an anonymous function. Or "lift" the :in/:out vals up to their argument lists.

sbromberger commented 6 years ago

This is a problem for all functions that currently take a neighborfunction those functions currently would require you to make an anonymous function.

This is a key point. We do this throughout the code for the degree functions. Alright, I think we're back to just renaming them in_degree and out_degree.

jpfairbanks commented 6 years ago

Yeah, and thinking for metagraphs you might have weighted_degree property_degree` which count the degree according to some other definition than the number of edges such as sum of edge weights or number of edges that satisfy some predicate.

sbromberger commented 6 years ago

I pulled 0.13.0 because I'd really like to get this in first.