Open bayersglassey-zesty opened 2 years ago
There is a workaround: you can wrap your labels in graphviz.quoting.nohtml
.
I guess this issue is basically the reason for that function's existence, so I take it this will be marked "wontfix".
But it would be really nice if things were the other way around: by default, we assume that strings are not HTML, and if you want to mark one as being HTML, then maybe you wrap it in graphviz.quoting.html
.
Thanks for the detailed report.
This is indeed working as documented, see https://graphviz.readthedocs.io/en/stable/manual.html#quoting-and-html-like-labels
Hint: In case you are want to pass arbitrary user input (e.g. for labels), you might want to consider using graphviz.escape()
instead of graphviz.nohtml()
, which also disables backslash-escapes: https://graphviz.readthedocs.io/en/stable/manual.html#backslash-escapes (unless you want to be able use them).
Side note: use the official public API names (which are also simpler): https://graphviz.readthedocs.io/en/stable/api.html#quoting-escaping (all directly under graphviz
).
And here is the hacky regex the
graphviz
library is using to detect such labels, which incorrectly matches e.g.'<lambda> <map>'
... in~/.venvs/3.9/prefect/lib/python3.9/site-packages/graphviz/quoting.py
:
This interpretation of <...>
as HTML-like label comes from upstream Graphviz, see https://graphviz.org/doc/info/lang.html:
import graphviz
graphviz.Source('''
digraph {
a [label=<lambda> <map>]
}
''').render()
Error: <stdin>: syntax error in line 3 near ']'
IIUC you might be implying that graphviz.quoting.quote()
should rather detect that lambda> <map
is not valid XML by trying to parse the input input as XML and adding quotes in case that fails. IMHO that would be un-Pythonic ('In the face of ambiguity, refuse the temptation to guess.'). AFAIU, it would not resolve the ambiguity because we still need a way to render something that is angled-bracketed and inside looks like valid XML literally instead of rendering it as an HTML-like label (e.g. graphviz.nohtml('<>')
).
I completely agree that this is something for users to stumble upon and it would probably better for HTML-like labels to be opt-in. The same reasoning might apply to backslash-escapes, should they be opt-in as well? #53 is another wart that probably requires a backwards-incompatible behavioural change or maybe duplicating some arguments or function names to provide for the different use cases (treat strings literally vs. let them be interpreted).
My plan/proposal is to address all these shortcomings that might involve subtle API/behavioural changes in a single version jump that might be allowed break backwards-compatibility, e.g. for the 1.0
or maybe even 2.0
version of this project. As you see, there is a lot to consider. Sorry for the inconvenience.
Thank you for all the links to documentation! I learned some things about DOT today... Graphviz "record shapes" are pretty interesting. O_o I would never have imagined that such a feature existed!
The same reasoning might apply to backslash-escapes, should they be opt-in as well?
I think so!
IUC you might be implying that
graphviz.quoting.quote()
should rather detect thatlambda> <map
is not valid XML by trying to parse the input input as XML and adding quotes in case that fails.
No, actually I was surprised to see a regex looking for HTML in the first place! In all the time I've used the graphviz
library, I had assumed it was doing something very simple with labels, like "if it contains special characters, quote it, otherwise don't bother".
I came to use the graphviz
library as a Python programmer who wanted to render simple graphs as quickly as possible. The graphviz
library is the solution commonly suggested online. I started using it without knowing anything about Graphviz and DOT; later, I learned enough to understand that graphviz
is just building output to feed to the dot
command, and how to inspect the .gv files generated by Digraph.render
.
It's fairly easy to generate DOT oneself, but quoting rules can get tricky, so I had assumed that one of the main benefits of using graphviz
was for it to take care of that for you. So I trusted it to take any string I gave it, and quote it so that it became a node/edge name.
Now that I know about the idea of id:port:compass, which is very fancy, I think it would have been nice if port and compass had been kwargs, as opposed to things parsed out of the name; or even if name was allowed to be any of: 'id'
, ('id,)
, ('id', 'port')
, or ('id', 'port', 'compass')
.
(Edit: in fact, I now realize that I have run into this before!.. I was trying to use URLs as node IDs, and of course the colon in "http://" was resulting in everything after http:
being quoted, which was about the most confusing behaviour I had ever seen!)
As for HTML labels... that's a cool feature (especially when you see the results you can get with "record shapes"), but I bet most "Python programmers who want to render simple graphs as quick as possible" are never going to suspect that such a things exists, until it randomly bites them (as seems to have been the case with the Prefect library).
My plan/proposal is to address all these shortcomings that might involve subtle API/behavioural changes in a single version jump that might be allowed break backwards-compatibility, e.g. for the
1.0
or maybe even2.0
version of this project. As you see, there is a lot to consider. Sorry for the inconvenience.
That sounds excellent. It's good to know you were already thinking about this stuff; and I'm sorry for the inconvenience of having people like me who use the library for ages without reading through all of the documentation first. :)
Actually, I think it might be nice to mention graphviz.escape
in a little "information bubble" in this section of the docs: https://graphviz.readthedocs.io/en/stable/manual.html#basic-usage
...I see that backslash-escaping and HTML-like labels are mentioned a bit further down, but I believe a casual "Python programmer who wants to render simple graphs as quickly as possible" is going to see those sections and actively ignore them, believing them to be too fancy for their needs.
So the first code example one sees on that page is:
dot = graphviz.Digraph('round-table', comment='The Round Table')
...and for most people, I think the takeaway is "I can pass an arbitrary string as the name, e.g. 'round-table'
".
But if there was an example there showing escape
in action, people might think "ah ok, I probably want to always use escape
until I get to the point where I want to learn about fancier features".
dot = graphviz.Digraph(graphviz.escape('round-table'), comment='The Round Table')
Thanks, and +1 to your proposal to add a better visible warning about possivle quoting/escaping gotchas. See 25043ed2b59bfdc1ac60db10a2d526132ed2a4ce.
I learned enough to understand that
graphviz
is just building output to feed to thedot
command [...] quoting rules can get tricky, so I had assumed that one of the main benefits of using graphviz was for it to take care of that for you
Yeah, this is a very simple library (only around a thousand lines) and +1 the main thing it helps with is quoting/escaping. Because I did not manage to get that completely right on the first try and at least some users might depend on the ability to interpret e.g. backslash escapes, we need to be careful with fixing that (major version change).
if name was allowed to be any of:
'id'
,('id,)
,('id', 'port')
, or('id', 'port', 'compass')
.
+1. I have thought about that earlier: add support for tuples as a backwards-compatible first step towards fixing #53, so we can change the behaviour in a second step and require the use of tuples for port/compass later. Feel free to add that proposal to the issue thread (with the introduction of type annotations, Python provides nicer ways to document that by now as well). Let me know in case you would like to contribute on this feature.
See https://github.com/xflr6/graphviz/commit/25043ed2b59bfdc1ac60db10a2d526132ed2a4ce.
Ah, that's exactly what I had in mind! Thanks for that. (There is one typo I noticed, "wraping" should be "wrapping".)
I have thought about that earlier: add support for tuples as a backwards-compatible first step towards fixing https://github.com/xflr6/graphviz/issues/53, so we can change the behaviour in a second step and require the use of tuples for port/compass later. Feel free to add that proposal to the issue thread (with the introduction of type annotations, Python provides nicer ways to document that by now as well). Let me know in case you would like to contribute on this feature.
Okay -- I'll mention it in there, and actually yes I would be interested in seeing if I can add tuple support as a first step :)
Thanks, fixed the typo in 4aa4d2441552d72925388e7ce2232c818e91d751.
I'm seconding the call for more documentation, and for documentation in the places it is most critical. I came here to file an issue because as far as what I could tell from the API reference G.node(name: str, label: Optional[str] = None...)
suggests that this is just a python string, and the behavior I was seeing looked like a bug. I would personally agree that assuming HTML parsing by default is a "bug" in the sense that assuming that angle-brackets around a string necessarily means it is HTML is a foolhardy assumption given how many other meanings they have. But well-placed documentation would at least help.
There's nothing there that hints at the possibility that there is a Quoting and HTML-like labels kind of functionality that reacts differently to different strings. If I was a novice reading the documentation from beginning to end, I might have known that, but an experienced programmer often just goes to the API documentation and looks for examples of how to invoke the functions.
Thus I had no idea why the first of these node labels had no quotes (and was of course syntactic garbage that led the renderer to barf) while the second one did have quotes:
n23 [label=<&T as core::fmt::Display>]
n24 [label="<&T as core::fmt::Display>::fmt"]
There's nothing there that hints at the possibility that there is a Quoting and HTML-like labels kind of functionality that reacts differently to different strings.
There is a warning at the end of README.md
and https://graphviz.readthedocs.io/en/stable/manual.html#basic-usage (added in 25043ed2b59bfdc1ac60db10a2d526132ed2a4ce). See https://github.com/xflr6/graphviz/issues/168#issuecomment-1100677538 on backwards compatibility.
I'm not saying it's not documented; once I found this thread I was able to find the documentation (hence link it above). I'm saying that someone who bookmarks the API documentation and goes there to try to understand the behavior doesn't get any hints.
It could be as simple as editing the description for the argument to say something like:
Caption to be displayed (defaults to the node name). This string is parsed according to various patterns described in the basic usage section, which affect how it is rendered in the GraphViz file.
someone who bookmarks the API documentation
Oh okay, fair point: how about https://github.com/xflr6/graphviz/compare/master...api_docs_escape?
Added callouts:
Thanks for this!
I'm starting this based on this Prefect issue: https://github.com/PrefectHQ/prefect/issues/5656
Long story short, in graphviz library's
quoting.py
, there is this regex:...which incorrectly matches strings like
'<lambda> <map>'
.To reproduce:
...this results in:
...and the generated Digraph.gv looks like: