thomasp85 / tidygraph

A tidy API for graph manipulation
https://tidygraph.data-imaginist.com
Other
546 stars 61 forks source link

support dagitty objects #183

Closed grasshoppermouse closed 10 months ago

grasshoppermouse commented 10 months ago

resolve #179

I'm an anthropologist, not a software engineer, so I hope I'm doing this right

thomasp85 commented 10 months ago

Does dagitty objects not contain any node and edge attributes? (asking in earnest as I'm not familiar with the package)

grasshoppermouse commented 10 months ago

I'm just learning about the package myself. dagitty is used to specify causal relationships among variables to aid designs of scientific studies. It is an R wrapper around a javascript library. Graphs are specified using the Graphviz dot language, where node and edge attributes are set with square brackets. Ideally, some R function already exists to parse a dot string and convert it to igraph, but I couldn't find one.

As far as I can tell, dagitty nodes can optionally have one of these 3 attributes: "exposure", "outcome", "latent". There are getter functions for those. Nodes can also optionally have x, and y coordinates for layout, and there is a function to get those.

Edges have 3 types: "->", "<->", "--" (directed, bidirectional, undirected), and there is a function to get those. But in tidygraph/igraph, don't all edges have to be either directed or undirected?

Edges can optionally have a "beta" attribute that sets the strength the causal relationship. As far as I can tell, there is no function to get those values. However, there are internal functions dagitty:::.vertexAttributes(g, a) and dagitty:::.edgeAttributes(g, a), where "g" is a dagitty object and "a" is any user-specified attribute.

So, I could add the node attributes exposure/outcome/latent, if they are present, I could add the node x & y coordinates, if present, I could add the edge direction attributes, and I could add node_attribute and edge_attribute arguments to get user-specified attributes, with defaults node_attribute=NULL and edge_attribute="beta".

How does that sound?

szhorvat commented 10 months ago

This is up to dagitty of course, but I wanted to say that DOT is a rather poor choice for data exchange. It is a format that is specific to a single software, Graphviz, and has features that make sense only for that one software. I once looked into how easy it would be to write a robust parser for this format for igraph. It would not be easy, and most likely it will never happen. The solution for these sorts of problems is to use a format which was designed for data exchange, not as data description language of one specific software. GraphML is a good choice for this. So are various formats built on JSON.

thomasp85 commented 10 months ago

Yeah, we should get as much info into the tbl_graph as possible. It is true that tidygraph/igraph doesn't support bi-directional edges. And edges can't be undirected in a directed graph. Maybe it is best to add a "type" attribute to edges that encode if it is undirected, directed, or bidirectional since we can't really capture that information otherwise.

It seems weird that there are attributes in the structure that the user cannot get to, but if that is so we shouldn't extract them as we don't want to take a potentially breaking dependency on a package.

As for their choice of DOT, not much we can do about that. The whole point of tidygraph is to some extend to save people from relying on packages with questionable structure when possible :-)

szhorvat commented 10 months ago

That comment was also partly an explanation of why igraph is unlikely to ever support reading DOT files. As for writing DOT files, that's already supported, and I consider that essential. We want to make it easy to plot igraph graphs with Graphviz.

grasshoppermouse commented 10 months ago

I pushed an update to add node attributes that are accessible with dagitty functions, as well as one user-defined attribute for nodes and one for edges, with the latter set to default to "beta", which is an important, commonly used edge attribute.

thomasp85 commented 10 months ago

Is there only ever one user-defined attribute for nodes and edges?

grasshoppermouse commented 10 months ago

In principle, I believe there can be arbitrary numbers of user defined node and edge attributes. In practice, I'm not sure. The attributes are meant to specify aspects relevant to the theory of causal diagrams, and almost all have dedicated functions in dagitty (I think beta might be the only exception).

grasshoppermouse commented 10 months ago

I found this fairly extensive package, which adds a basic as_tbl_graph method for dagitty objects, but doesn't add node or edge attributes to the tbl_graph object. It also converts dagitty objects to its tidy_dag objects, which preserve some attributes:

https://github.com/r-causal/ggdag

thomasp85 commented 10 months ago

So, would you think there is still need for a method in tidygraph proper?

jtextor commented 10 months ago

Hi all, interesting discussion here. I'm certainly willing to update the package to export the edge attributes; I was planning to do this anyway.

Regarding my "poor choice" to use the dot syntax, I want to give a few arguments why I did this:

So for input by a human, I still feel it's really useful. I agree that it's less suitable as an exchange format and is complicated to parse (I had to write my own parser)

However, dagitty can already export to various other formats, and it would be quite simple to add a GraphML exporter. There is a function "toString" in the dagitty package that implements various output formats already. I could just add GraphML as a further option. The only obstacle is that GraphML only supports directed and undirected edges, as far as I can tell ... so I would have to add some custom nonstandard attributes.

Does your package support edges other than undirected and directed?

grasshoppermouse commented 10 months ago

So, would you think there is still need for a method in tidygraph proper?

I'm curious to get @jtextor 's opinion, since he knows the dagitty ecosystem best: where should an as_tbl_graph.dagitty method live? In dagitty?, ggdag (which already has a barebones one)? or tidygraph?

jtextor commented 10 months ago

I have no strong opinion on this. But having now read the entire discussion I think it's the easiest solution to just export to GraphML, which then can be read by this package if developers are open to it. But then we'd have to agree on a way to encode the different type of edges, which is not part of the GraphML standard.

grasshoppermouse commented 10 months ago

@jtextor, would you be willing to add an as.igraph.dagitty method to dagitty? I see you already suggest igraph, which has a huge number of useful functions, and tidygraph is based on igraph.

thomasp85 commented 10 months ago

If daggity were to get an as.igraph() method it would work out of the box in tidygraph

jtextor commented 10 months ago

To some extent, this is already available. The R package "causaleffect" is based on igraph as well, and there is a function that converts to an igraph object that can be understood by causaleffect. E.g.,

dagitty::convert(dagitty::getExample("Shrier"),"causaleffect")

However the way in which <-> edges are represented in causaleffect is not straightforward. These are converted to two directed edges, both of which are given a special attribute.

I looked around the igraph manual and now they seem to have a somewhat-standard way to represent mixed graphs with directed, undirected and bi-directed edges. This covers most of what the causal inference community needs. So I'll add a method "as.igraph" that will follow that convention.

thomasp85 commented 10 months ago

@jtextor thank you! As this solves support for as_tbl_graph() completely I'll be closing this PR. Still, thank you @grasshoppermouse for pushing this forward 🙏