sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
671 stars 185 forks source link

Graph edit distance #415

Closed juliohm closed 7 years ago

juliohm commented 8 years ago

Is anyone working with graph edit distance? https://en.wikipedia.org/wiki/Graph_edit_distance

Where should this function go in the project tree? Can I create a new file in the source directory for distances between graphs?

CarloLucibello commented 8 years ago

really nice addition!

Maybe we should create a distances folder, move distance.jl to distances/distance.jl and add distances/graphedit.jl

juliohm commented 8 years ago

I'll research the algorithms out there and if I get something useful, I'll add to LG.

juliohm commented 8 years ago

@CarloLucibello can I rename distance.jl to morphology.jl instead?

I think it is more intuitive to have graph edit distances in distance.jl and eccentricity, radius, diameter, periphery in morphology.jl.

juliohm commented 8 years ago

Or perhaps geometry.jl

-Júlio

CarloLucibello commented 8 years ago

I like topology.jl more. Geometry.jl is my second choice

juliohm commented 8 years ago

Perfect, topology.jl it is.

-Júlio

sbromberger commented 8 years ago

erm, hold on a sec. I need to think about this a bit.

(The reason is that to me, eccentricity, radius, diameters, etc. are all distance properties of graphs and it's not apparent [again, at least to me] why we'd want to split them up / rename them.)

juliohm commented 8 years ago

Hi Seth, what is your opinion on the naming? When I think of topology.jl I remember of simplicial complexes, epsilon balls and very specific calculations on graphs that have no much connection to diameter , eccentricity and so on. My opinion is that geometry.jl or morphology.jl are more adequate names for these operations.

Do we all agree that graph edit distances and spectral distances should be put in distances.jl?

-Júlio

sbromberger commented 8 years ago

@juliohm sorry, I may have misunderstood your proposal. Could you restate it? (Are you proposing to a) rename distance.jl, b) create a new topology.jl to hold your graph edit distance code + whatever else is appropriate, or c) something else?)

juliohm commented 8 years ago

I'm proposing 1) the rename of distance.jl into geometry.jl and 2) the addition of distances between graphs in distances.jl. Is it ok?

-Júlio

sbromberger commented 8 years ago

perhaps. One thought: we're going to be creating a linalg/ directory that will hold spectral stuff as well as the things that deal primarily with networks / matrices. Perhaps spectral distances go in there?

Give me a little bit to think about this and let me know what your opinion is on the above.

sbromberger commented 8 years ago

PS: not that I know a whole lot about GED, but doesn't this require isomorphism detection?

jpfairbanks commented 8 years ago

Am I right to understand the core of the issue is that we want to separate distances between graphs and distances within a single graph?

So that stuff like eccentricity and diameter which relate to metrics on the vertex set of a single graph are separated from properties like edit distance and spectral distance which are metrics on the set of all graphs?

I agree that topology is not a great name for this. I think we might want to save the name geometry for things related to graphs from a geometric context like planar graphs, planar embeddings, tilling, triangulation, and meshes.

Is it easy enough to keep the metrics within a single graph in distances.jl and put the metrics on the set of graphs in metrics.jl?

sbromberger commented 8 years ago

Thanks, @jpfairbanks. If all we're doing is bikeshedding on names, then my preference is to keep distance.jl as is and pick a suitable alternative for the new stuff. No point in changing two things if it's just a name.

juliohm commented 8 years ago

When I think of a directory called linalg/, I remember of linear solvers and numerical analysis, the name is too broad in my opinion. I'd say spectral/ or something more specific? I personally avoid adding directories in the source tree unless they are really necessary (lots of files). Directories add an extra layer of complexity when it comes to adding new code in the tree. Where should the code go? Should we split it in separate directories? Categorizing items always turns out as a headache for new contributors.

Directories remind me of Python modules and all those conventions to import code with relative paths. I love the fact that Julia puts everything in the same bucket and handles namespaces at the source level only.

Regarding the spectral distances, I'd put them together with the other distances, but anything works, just let me know.

I'll submit a PR with a simple distance as a starter.

-Júlio

sbromberger commented 8 years ago

@juliohm thanks. I sort of understand the overbroad nature of LinAlg, but James and I had a chance to catch up in person with some other graph folks last week and this was the best we could come up with. "Spectral" doesn't really capture all the stuff we're going to put in there. Perhaps "network"?

I also hear you re: directories, but I like segregating the code this way. It makes it easy to figure out where things live - especially since we're now at 45 source files. I think that I want to keep this hierarchy until we hit some sort of limitation or disadvantage that overrides the convenience - and I don't think we're there yet.

Anyway, open to proposed names and ESPECIALLY PR's. Thanks for the help.

juliohm commented 8 years ago

Thanks @sbromberger, you guys are doing a great job with LightGraphs.jl, it is being extremely useful in my research.

I really don't mind about the project tree, whatever you guys decide I will follow. Let me open a PR with a simple spectral distance to get it going. I will put spectral_distance in spectral.jl and will create a new file graphdist.jl with the GED algorithms if I come up with something useful.

sbromberger commented 8 years ago

Thanks for your flexibility (and PRs!), @juliohm . Don't get settled on the file names - it's entirely possible that we'll change things around. For now, do what makes the most sense (that is, don't change any existing names!) and we'll figure out what's best in the long term once we get the most important thing done, which is increased functionality.

That said: does the GED stuff imply isomorphism tests? Because that'd be cool as heck.

jpfairbanks commented 8 years ago

We were thinking of the name linalg for capturing everything that uses Graphs as Matrices. This goes with the flow of Base.LinAlg which contains the linear solvers and eigensolvers. The name spectral came from the fact that I was doing spectral partitioning. This uses the eigenvectors and eigenvalues since we are planning to add some more linear solver based features, it makes sense to put everything regarding linear solvers and eigenvalue solvers into a common area of Linear Algebra.

LinAlg could include:

I think that graph spectral distances makes more sense to categorize with other metrics on the set of all graphs because that is a higher level of distinction.

juliohm commented 8 years ago

There are many definitions out there for exact and error-tolerant matching, some of the distances I saw rely on finding the maximum common subgraph (MCS) between the two graphs and all that NP-hard stuff, I am not sure of how far these algorithms can get, but I am curious to give them a try.

In those definitions, finding the MCS is like testing for isomorphism.

sbromberger commented 7 years ago

@juliohm is this issue / discussion still valid now that we've merged #418? If so, no worries. If not, can we close it out?

juliohm commented 7 years ago

Hi Seth, I have many other distances on the way to add to LG. I'll close the issue right after we merge other future PRs.

-Júlio

sbromberger commented 7 years ago

Awesome - looking forward to more!

jpfairbanks commented 7 years ago

@juliohm Do you want to make a checklist of the distances that you intend to add? That will give us a scope for this issue and we can close it when the checklist is completed

juliohm commented 7 years ago

@jpfairbanks I will send a PR with the graph edit distance that I coded last week and close the issue. I will experiment with other distances depending on my research directions, I don't have a checklist in mind right now, sorry.

sbromberger commented 7 years ago

So, what I'd like to do is close this issue out since there's no problem to be resolved or specific gameplan for enhancements. @juliohm - please feel free to open up new issues/PRs as you need them to discuss your upcoming work. I'm really looking forward to seeing it!