terrastruct / d2

D2 is a modern diagram scripting language that turns text to diagrams.
https://d2lang.com
Mozilla Public License 2.0
16.45k stars 411 forks source link

Proposal: d2 syntax tweak suggestions to make parsing easier #543

Open judepayne opened 1 year ago

judepayne commented 1 year ago

Hi, thanks for d2. It's terrific! I've been working on a parser and compiler for d2 to/from Clojure here. My potential use case is the production of architectural diagrams from metadata held in a large organization with d2, terrastruct gui as the drawing layer. It's important in that use case to be able to produce d2 programmatically and also parse it back again.

Writing the compiler was no problem but in writing the parser (still a WIP) I came up with a few suggestions for making the grammar of d2 to make it a little easier to read/ parse. This would better support community supplied compilers and parsers from/to other languages which could help the d2 ecosystem in the long term. I thought it worth mentioning these since d2 is still relatively young.

Please treat as suggestions. Some of these might cost someone writing d2 manually a few more keystrokes (at the cost of better readability) - sorry!

  1. Container syntax

I couldn't find a way to parse d2 without the parser having to know all the d2 reserved keywords. I'd set out to not have to have the parser know these as the language is to be under active development. I felt a community parser which might get developed but then not updated much should still be able to add value even if the set of reserved keywords got expanded over time (as long as the language structure didn't change). the d2 executable which has to be used at some point anyhow can catch keyword errors.

This came down to how attributes of containers are laid out alongside nested elements (shapes, connections, other containers) rather than having their own map. e.g.

my-ctr: Container {
  shape: person
  my-shape: Shape
}
;; parser needs to know 'shape' is special

Having the nested elements inside their own structure would make easier to parse (and read):

my-ctr: Container { shape: person } [
  my-shape: Shape
]

I felt that the curly brackets {} were being used for two purposes - attribute maps and scope of contained elements.

  1. Labels

Two suggestions for labels. Firstly, a minor issue but when no label is supplied in the second position of an element perhaps throw an error if a colon is present. So the colon is only allowed when the label is supplied. It just makes parsing a tiny bit easier.

Secondly, I began to wonder if it's worth promoting label as a special attribute than can be promoted outside of the element's attribute map. e.g:

my-shape {label: Shape}

is easy to read and avoids having shape elements look an awful lot like attributes to a person or parser:

my-shape: Shape
shape: Person
  1. The shortcut for style attributes is confusing and can lead to errors as noted in #416

  2. Sequence diagram

I struggled with the syntax for sequence diagram - it's hard to understand how the scope is set. e.g.

shape: sequence_diagram
 a -> b
 b <- a

is a valid sequence diagram, but say I wanted to add another connection c -> d not as part of the sequence diagram and I add it below, I can't seem to tell d2 that's it in a different scope without wrapping the sequence diagram in a container and using the {} to set scope. Would it not be worth having sequence diagrams explicitly be containers, like class diagrams? (The user can suppress the container label by setting to '').

thanks!

nhooyr commented 1 year ago

@judepayne

Thank you for your kind comments and your feedback. Glad you're enjoying D2!

Language changes are on the table until we hit v1. Although they will be minor and come with automated tooling to transition.

  1. Container syntax

I'm confused by this suggestion. This seems to be for making compilation easier, not parsing. Which is fair. But D2 is designed to be easy to write and using the same scope/syntax for both children and container elements makes it easier to write D2.

  1. Labels

Firstly, a minor issue but when no label is supplied in the second position of an element perhaps throw an error if a colon is present. So the colon is only allowed when the label is supplied. It just makes parsing a tiny bit easier.

It's not clear to me what you mean by this. Could you show an example of the syntax now vs what you propose?

Like you want x { y: ok } to work? If so, it already does, d2 fmt just adds the colon afterwards for consistency reasons. This is a matter of subjective taste and something we can later add an option to d2 fmt to control.

Secondly, I began to wonder if it's worth promoting label as a special attribute than can be promoted outside of the element's attribute map. e.g: my-shape {label: Shape}

It's easy to read for sure but not easy to write. D2's not a programming language where errors can be catastrophic and so our goal has been make it easy to write and iterate on. That's why we have unquoted strings everywhere for example. And d2 fmt.

The shortcut for style attributes is confusing and can lead to errors as noted in https://github.com/terrastruct/d2/issues/416

There is no shortcut, see the response from @alixander in that issue, it's a bug.

  1. Sequence diagram I struggled with the syntax for sequence diagram - it's hard to understand how the scope is set. e.g.

You can do this:

shape: sequence_diagram
a -> b
b <- a
_.c -> _.p

Would it not be worth having sequence diagrams explicitly be containers, like class diagrams?

They are, I don't understand what you're suggesting. Can you show an example before/after your suggested change?

Suggestion

Why don't you just use our Go parser/compiler? It's all open source and easily useable. See https://github.com/terrastruct/d2/tree/master/docs/examples/lib You'll have a much easier time keeping up to date and won't need to worry about the specific grammar of which there is no spec and lives in just my head right now.

judepayne commented 1 year ago

Hi

thanks for your comments. Let me better explain my premise.

I'm a domain architect in a large organization. We have thousands of applications connected by tens if not hundreds of thousands of data flows. Producing accurate, up to date current state architectural diagrams with that complexity is challenging to say the least! We describe all of that complexity with metadata with is gathered in various ways from automatic to manual with periodic attestation. In that scenario, we need to store the metadata in a database so that it's useful. It struck me that d2/ terrastruct could be a great solution for diagramming selections of that estate. For that to work we'd need to get the metadata out of our database and convert it into d2. Then, maybe in the terrastruct gui, have a development manager who is say attesting to the set of data flows into/ out of their application, update them in the gui which updates the d2 (the bidirectional nature of the gui editing is cool). Then that d2 needs to be converted back into data ultimately to update the organisation's metadata database. Other existing processes beyond diagramming use the metadata so the native store format can't be d2.

In that scenario maybe we can't use your Go parser/ compiler; perhaps we don't have any Go developers or the Go language is not on the prescribed list. The metadata database is not Oracle for which you have an adapter. There's always all sorts of constraints in large organizations! There's already a number of vendor Architectural tools for large organisations but diagramming in my experience is always a weakness. Those tools could never produce the quality of diagram that a person would produce themselves and be happy with, so I think there's a great case for having a separate, focused architecture diagramming tool like d2 with the language open-sourced.

Therefore, I thought that it would be great to support the community writing compilers, parsers into (lots of) other languages. That would enrich the d2 ecosystem by making it more generally accessible + creating additional buzz around it. Outside large organizations maybe some developer wants to manipulate their d2 in C or Python or whatever.

So rather than having users only manually write d2 by hand, I'm suggesting that for large organisations the metadata already exists in some other form therefore d2 should be easy to programmatically write and read as well. Your Go parser would be one of several in a richer ecosystem.

The tweaks I've suggested were from my own experience of writing a parser for d2. Just suggestions.

Let me try to clarify where you've asked.

  1. Container syntax

Definitely writing attributes and contained elements next to each other is easier to write as in slightly less keystrokes, but not to read either programmatically or by a person, since you need knowledge of the current set of d2 reserved keywords - which means community written parsers can easily stop working as you expand the set of keywords. So the suggestion is to consider trading off a little bit of the ease of writing to improve the ease of reading by making the difference.

  1. Labels (fairly minor points)

    x: {y: ok}

should throw an error because no label is present.

x {y:ok}

should be fine. Just a minor point to make parsing a tiny bit simpler.

On my second point about not promoting label as a special value outside the attribute map, it is subjective I agree - again it's just the same trade off of increasing (programmatic) readability at the cost maybe of a little bit of the ease of (manual) writing.

  1. sequence diagrams

Maybe I'm not right on this one, but I found the example of:

shape: sequence_diagram
a -> b
b <- a

confusing because unlike with shapes and containers, the object is always explicit, has a key and has it's scope in the d2 file explicitly obvious - either terminated by a ';' or newline in the case of shape or '}' in the case of a container but with a sequence diagram, it's not obvious what the 'shape: sequence_diagram' is referring to - unless as you say, it's implicitly setting up a container. But if so, whereas other containers use '{' and '}' to delimit their scope, this one does not. Seems inconsistent.

Actually maybe this is one instance of slightly wider observation that you can have sort of orphaned attributes which it's not obvious what they refer to. e.g.

shape: person
a

This is valid d2 but it's not obvious what shape: person refers to, and indeed it does nothing, unlike in the sequence diagram example above. So maybe the answer is to prevent 'orphaned' attributes altogether because the scope of what they apply to is unclear, so for example you could force a sequence diagram to be wrapped in a container.

SD {
    label: ''
    shape: sequence_diagram
    a -> b
    b <- a
}

I'd love to see d2 continue to succeed and to do that, I think that more than the official Go d2 parser/ compilers will need to exist - these suggestions are for your consideration to make the programmatic reading of d2 a bit easier.

Overall, I think the language is really good! At the right time, it would be great to have a proper spec for the grammar. If you have a look here, there's an EBNF description of it that works!