Open jaron-lee opened 1 year ago
In a more general case, we also have the case of an ADMG and a MAG. In the case of a MAG, the difference is that there is only one edge allowed among nodes, and that there are no primitive inducing paths among non-adjacent nodes.
So the way i would imagine it is we have a base directed-undirected edge graph class, and then constructors for each type of graph that enforce whatever requirements the specific graph requires. I'll take a stab at this some time this week.
So the way i would imagine it is we have a base directed-undirected edge graph class, and then constructors for each type of graph that enforce whatever requirements the specific graph requires. I'll take a stab at this some time this week.
Do we need a constructor for each type of graph necessarily? Perhaps just assume user knows what they are doing and then provide a function to check validity? Or is that too expensive...
Do you think we should also have a base directed-undirected-bidirected-circle graph class then as well? This could lead to a PAG, ADMG (causal DAG w/ bidirectional edges) and a MAG? We just need functional checks to make sure the corresponding PAG, ADMG and MAG are valid?
This is sort of how ananke is designed - we have a base graph class that has all the edge types and then every kind of graph we actually use is a subclass of that. If we had a directed-undirected-bidirected-circle graph class then everything could just be constructed from that class I suppose. I imagine that for e.g. ADMGs you would just set the undirected and circle parts to empty graphs by default?
@adam2392 I just reviewed the PAG code and there is a function
is_valid_mec_graph
that appears to enforce the validity check in thePAG
class. Is that understanding correct?I believe this only enforces the validity when adding edges. Probably poorly named. I think we can refactor these names once we decide on a proper class for:
* mixed-edge graphs that support Chain Graphs, CPDAG, PDAGs and their extensions * mixed-edge graphs that support ADMG/DAGs and their extensions * mixed-edge graphs that support PAGs and their extensions
What are some good generic names for the directed/undirected graph and the directed/bidirected graph and the directed/bidirected/circle/undirected graph?
Referencing discussion in #67.
As far as names we could always go DUGraph, DBGraph, DBCUGraph. I am still of the opinion that we should create an API so that users don't have to call these graph types directly, but that we have better organization and less duplication of code in the implementation. Best of both worlds, I suppose. Consequently the naming shouldn't be that important.
So we have 4 types of edges - directed, bidirected, circle, and undirected. We could in theory create all 4 choose k subtypes for k > 0 (starting with the three that you mentioned above). But if we are creating e.g. a directed-bidirected graph, and then setting the bidirected argument to be empty, it seems somewhat arbitrary? Why would we not take the directed-bidirected-circle-undirected, and then just set all but directed to be empty grpahs?
It seems like we could create the one DBCUGraph with all four edge types and then for each kind of graph we actualyl want to consider (PAG, ADMG, DAG, CPDAG etc.) we create a wrapper around DBCUGraph that zeroes out the appropriate edge graphs.
If we are seriously following the networkx style of detaching the graphical representation from the operations on the graph (including checks such as validity) then this implementation should suffice, seeing as there should not be much need for any inheritance behaviors.
I propose to start with the following, and will open a PR/RFC to this effect:
Future proposals:
As far as names we could always go DUGraph, DBGraph, DBCUGraph. I am still of the opinion that we should create an API so that users don't have to call these graph types directly, but that we have better organization and less duplication of code in the implementation. Best of both worlds, I suppose. Consequently the naming shouldn't be that important. It seems like we could create the one DBCUGraph with all four edge types and then for each kind of graph we actualyl want to consider (PAG, ADMG, DAG, CPDAG etc.) we create a wrapper around DBCUGraph that zeroes out the appropriate edge graphs.
To summarize and clarify, is your proposal open to the idea of either:
? I don't have a great reason to support one over the other currently :p. But see below on some cons of supporting explicitly the class types ADMG, PAG, DAG, CPDAG, etc. The graph classes are not guaranteed without a check to be a "valid ADMG/PAG/CPDAG/etc.". This is an issue that was raised in #67.
If we are seriously following the networkx style of detaching the graphical representation from the operations on the graph (including checks such as validity) then this implementation should suffice, seeing as there should not be much need for any inheritance behaviors.
The reason the networkx design is "nice" imo (but not ideal) is that checking for validity of a certain "type of graph" is quite expensive. E.g. if we want to ensure that a constructed nx.DiGraph
is a DAG, then running nx.is_directed_acyclic(self)
every single time an edge is added/removed is not good (note this is what they do in pgmpy's implementation, which is why I'm not a fan). The same logic extends to our cases.
As an aside, seeing the implementation of CG and CPDAG as a subclass of DUGraph, I'm pondering the tradeoffs here:
If we can't arrive at a consensus here, we can discuss next week, or something?
Apologies for taking a while to get back to this. This would be worth talking about actually, if you have some time tomorrow after the usual meeting. Or happy to discuss another time if that doesn't work.
Regarding the proposal of DUGraph + all variants versus DBCUGraph, I am actually not super sure myself. I do think not supporting explicit graph types ADMG or CPDAG would be weird interface-wise, but perhaps the right way to think about this if we don't implement any algorithms inside the graphs themselves.
For 1., maybe we can lazily evaluate the validity of the graph (e.g. algorithms don't assume it's a valid instance of the class)? I think you're right that e.g. checking validity at each step is probably not going to be efficient or very tedious to
I think 2. makes the most sense design wise. Perhaps we can talk about potential API designs?
cc: @robertness @bloebp @kunwuz
Discussion with @jaron-lee we have decided the following strategy:
MixedEdgeGraph
that enables all causal-type graphs: DiBiGraph, DiUnGraph, DiBiUnGraph, DiBiUnCiGraphis_valid_cpdag(...)
, is_valid_pag(...)
, which are still TBD implementedIn the longer-term, we can consider removing all the specific causal graph class implementations, so that way it's more like networkx, but in the short-term, we can just rely on this approach to be internally lean, and rely on user-feedback to guide this decision.
Feel free to add anything I missed.
Check bnlearn, and Meek for the CPDAG definition.
Is your feature request related to a problem? Please describe. At the moment there is a graph class
pywhy_graphs.CPDAG
that handles graphs with directed and undirected edges. However, such graphs appear in other contexts throughout causal inference; for example, chain graphs. These graphs have the same representation but have a distinct and separate meaning.However, simply using the
CPDAG
class to represent chain graphs could be confusing as these concepts are distinct in the literature.Describe the solution you'd like A simple way to address this is to rename the CPDAG class to something more generic. Unfortunately the correct term (Acyclic Directed Mixed Graph) already applies to graphs with bidirected edges in place of undirected edges; thus another term like "acyclic directed undirected graph" might be considered.
CPDAGs and chain graphs would use this new class to implement their graphical representations. The onus would be on the user to pass this to downstream functions in a way that makes sense (e.g. it would be up to the user to not take output from the PC algorithm and fit a chain graph model to it, for example).
Describe alternatives you've considered Implement a separate chain graph class, with potential refactoring of the reusable parts for both into a parent class. This would leave recognizable API graph classes that might help users who are familiar with the literature to use the software.