ENH: Customizable plate labels

wd60622 commented 1 month ago

Before

# Currently would have to create graphviz instance
graph = pm.model_to_graphviz(model)
# Save output to .dot file and manually modify subgraph labels
print(graph)
# Resave as pdf using dot cli

After

# Might be similar to that of where the entry point to the change would be in `pm.model_to_graphviz`

def named_vars_plate_label(
    var: TensorVariable, 
    named_vars_to_dims: dict[str, Sequence[str]], 
    dim_lengths: dict[str, int],
) -> str: 
    dims = named_vars_to_dims[var.name]
    return " x ".join(str(dim) for dim in dims)

plate_formatters = {
    "Named Vars": named_vars_plate_label,
}
pm.model_to_graphviz(model, plate_formatters=plate_formatters)

Context for the issue:

I am interested in making graphviz diagraphs with a bit more customization of the plate labels than currently supported. Specifically, I would like to allow for labels without the shape size to provide a more abstract representation of the model. My specific example might look like this: example-graph

I am willing to take up this issue. Wondering if there are any preferences on the implementation details? My idea would involve extracting this logic here and providing some logic based on:

var : TensorVariable
named_vars_to_dims : var.name mapping to coord names
dim_lengths : coord names mapping to either TensorVariable or pre-evaluated shapes (addressing this TODO)

Related to #6716

ricardoV94 commented 1 month ago

How does this differ from https://github.com/pymc-devs/pymc/pull/7302 ?

wd60622 commented 1 month ago

How does this differ from https://github.com/pymc-devs/pymc/pull/7302 ?

7302 is styling of the nodes and this is name of the labels for each plate (label in bottom of each rectangle)

In the above example it would show the dim size. Ie obs (52) x covariate (5)

ricardoV94 commented 1 month ago

I feel that may be going too far? Can it be done after a canonical graph was created by PyMC from the outside?

Or anyway that a user can have that flexibility without we having to handle it

ricardoV94 commented 1 month ago

Regarding shape size specifically we can disable that with a boolean flag, if that's all you need?

wd60622 commented 1 month ago

I feel that may be going too far? Can it be done after a canonical graph was created by PyMC from the outside? Or anyway that a user can have that flexibility without we having to handle it

Yes, fair. I agree. Ideally that'd be something to process afterwards. However, based on current understanding of graphviz package is that would not be possible. I could be wrong but it seems like much of the information is stored in pre-formatted strings and pieced into larger dot format. reference

I suspect it might be possible through the networkx objects. i.e. pymc model -> networkx -> modifications -> graphviz

Regarding shape size specifically we can disable that with a boolean flag, if that's all you need?

Yes, I think that that is a fine middle ground. You are thinking a single boolean? For example,

# Default resulting in plate labels like obs (52) x covariates (3)
pm.model_to_graphviz(model)
# Optional boolean to result in plate labels without number in parentheses. i.e. obs x covariates
pm.model_to_graphviz(model, include_shape_size=False)

ricardoV94 commented 1 month ago

The networkx path may not be too crazy?

wd60622 commented 1 month ago

The networkx path may not be too crazy?

Will give it a try and report back with findings.

I trying to avoid parsing and reconstruction of the strings that we create

wd60622 commented 1 week ago

I explore this a bit and I wasn't able to get the result that I wanted.

The cons of it:

networkx converts to pygraphviz instead of graphviz which requires an additional dependency
Plate information seems to be lost after it is converted via networkx
Also, parsing of labels would have to happen regardless

Hard to say there were any pros :laughing:

I think an include_shape_size boolean flag would be the easiest way to go. What are your thoughts on this? @ricardoV94

ricardoV94 commented 1 week ago

Ideally we would have a very bare bones utility that gives users the model tree structure and meta info and let them do whatever they want with it.

Our graphviz gen could use this utility, but the code would be simple enough for someone to just copy and tweak without we having to offer customization for all the visual elements that accidentally became the default?

If that feels like too much work, that's fair and we can add one more kwarg. But since people are becoming interested in this I worry about the next round of feature requests.

WDYT?

wd60622 commented 1 week ago

Ideally we would have a very bare bones utility that gives users the model tree structure and meta info and let them do whatever they want with it.

Our graphviz gen could use this utility, but the code would be simple enough for someone to just copy and tweak without we having to offer customization for all the visual elements that accidentally became the default?

If that feels like too much work, that's fair and we can add one more kwarg. But since people are becoming interested in this I worry about the next round of feature requests.

WDYT?

Yeah, this is a bit of what I had in mind with the initial solution and where I was headed with the #7302 PR. That there would be some underlying generator(s) that could just provide the people with the information that they'd like. Also, that would be used to support the current methods (seems like the graphviz and networkx methods are a bit intertwined and might have some consolidation / reliance on this underlying utility).

I can think about the implementation a bit. Do you have any thoughts on how you'd like it implemented?

ricardoV94 commented 1 week ago

I can think about the implementation a bit. Do you have any thoughts on how you'd like it implemented?

No, but very curious what you can come up with!

pymc-devs / pymc