pdml-lang / pdml-lang.github.io

PDML website (including all docs)
https://pdml-lang.dev/
GNU General Public License v2.0
1 stars 0 forks source link

Whitespace Open Rules and Editors' Syntaxes #6

Open tajmone opened 2 years ago

tajmone commented 2 years ago

@pdml-lang, I've started working on a PDML syntax for Sublime Text:

https://github.com/tajmone/Sublime-PDML

The repo is just a skeleton, and I'm working on the actual syntax locally for now.

I have some practical questions to help me clarify some parsing issues which are not clear to me. Since this repository has no Discussions, I'm creating an Issue (might be worth enabling Discussions though, since it's a standard/spec, and people might like to share their implementation, etc.).

Whitespace Open Rules

In Basic PDML Specification » Whitespace it says:

There are no whitespace handling rules defined in basic PDML.

Whitespace is preserved when a PDML document is parsed.

and:

Applications reading PDML documents (or customized PDML parsers) are free to implement any appropriate whitespace handling rules, such as:

  • skip whitespace nodes
  • trim leading and/or trailing whitespace in text nodes
  • replace whitespace sequences with a single space (similar to HTML)

Which, if I've understood correctly, is a way to keep the standard open to different applications, according to domain-specific needs — i.e. the Basic PDML Spec lies the foundations for a versatile data and markup scheme, leaving some details open on purpose in order not to preclude any specific use.

Editor Syntaxes vs Open Rules

But when implementing an editor syntax, whitespace can indeed become a problem, since the syntax needs whitespace handling rules in order to work.

Let me illustrate the problem using actual examples from the PDML website...

Whitespace Is Preserved

Again, quoting Basic PDML Specification » Whitespace:

Whitespace is preserved when a PDML document is parsed.

Consider the following PDML snippet:

[a  foo   [b]
    2 [c] [d]
]

According to the above example, node [a is parsed as:

{space}foo{space}{space}{space}

which indicates that leading- and trailing-spaces in a text node are preserved (separator space excluded).

Whitespace Is Ignored

In PDML Examples » Tree we find the following example:

[config
    [color orange]
    [size
        [width 1618]
        [height 1000]
    ]
]

where it says:

Node config has two child nodes: color and size. Node size has two child nodes too: width and height.

According to this example, all whitespace between parent and child nodes is simply ignored.

This seems to indicate that there is indeed an implicit whitespace handling rule at play here, i.e. that a text node can't be whitespace only!

If that wasn't the case, then the above example should produce various text nodes too, interpreting any whitespace between the parent nodes and their children as text nodes with spaces only:

Do you see my point? According to the EBNF grammar and the railroad diagrams, a separator consists of a single whitespace entity (optional in some cases, like [node1[node2), and everything else is to be considered a text node — which the first of the two examples seems to confirm.

From the standpoint of an editor syntax (based on RegExs definitions) we need a clear rule regarding this.

Impressions…

I get the impression that the second example wishes to illustrate how trees with various levels of nodes nesting can be prettified by using indentation to provide visual cues to structure.

On the other hand, the first example seems to emphasize how PDML parsing is rules-free when it comes to handling literal text, which allows its use for handling raw strings snippets in a lossy-less manner — e.g. when handling code styled listings, where indentation matters.

But you can see that there's a grey area in-between these two extremes, when it comes to whitespace only nodes (e.g. indentation snippets, or sequences on newlines, etc.) — we can either honour them or ignore them, but can't do both at the same time.

Even though the principle of openness stands (i.e. that these details should be handled by the specific applications), editors support needs an explicit rule for this, lest it becomes an opinionated implementation of the Basic PDML syntax.

Proposals

Personally, I think it makes sense to treat whitespace in-between nodes as noise, unless there's some text, in which case we should preserve trailing and leading space entities (mandatory separator excluded), and then leave it to the user to decide whether to trim the spaces on the edges or not, either in the editor or in the final application.

One could add extra functionality to the package, e.g. to trim all leading- and trailing-whitespace from text nodes — but this also means that prettying PDML trees can only be done safely when dealing with trees where text nodes don't span across multiple lines (and when the proposed whitespace-only rule is in place).

What's your advice and guidance on this? how should the syntax handle the whitespace problem.

Also, do you see two potentially different types of the trees when dealing with pretty-formatting (as in JSON) as compared to text nodes spanning across multiple lines?

I guess these type of problems are solved through extension of the Basic PDML, e.g. by introducing specific operators to handle multi-line strings and their whitespace (like for HereDocs, or YAML's use of the pipe | in string blocks).

But right now that we're trying to build some basic editors' support for PDML, we need to address the question in practical terms.

pdml-lang commented 2 years ago

I've started working on a PDML syntax for Sublime Text

Great! Thank you.

Since this repository has no Discussions

Sorry for that. Discussions are now activated. However, I don't move this thread to discussions, because the PDML docs need to be enhanced (as explained below).

if I've understood correctly, is a way to keep the standard open to different applications, according to domain-specific needs

Yes, exactly.

In PDML Examples "Tree" we find the following example ... where it says: "Node config has two child nodes: color and size. Node size has two child nodes too: width and height."

This statement in the PDML doc is actually misleading, and I have to clarify this in the doc.

The statement is correct from a logical point of view when we look at the relevant content of the config data. But a basic PDML parser would indeed parse the whitespace between nodes as text containing whitespace.

However, a dedicated PDML parser for config files would certainly apply the "skip whitespace nodes" rule, so that the parsed AST would no more contain the whitespace between nodes.

Do you see my point? According to the EBNF grammar and the railroad diagrams ...

Yes. The EBNF grammar and the railroad diagrams are correct, it's just the text of the 'tree' example that needs to be corrected.

I get the impression that the second example wishes to illustrate how trees with various levels of nodes nesting can be prettified by using indentation to provide visual cues to structure.

Yes

What's your advice and guidance on this? how should the syntax handle the whitespace problem.

I would suggest that editor plugins do exactly the same as a basic PDML parser: preserve whitespace (don't handle it).

You can't know the domain in which an editor plugin is used, so any 'standard' rule would work well in some domains, but not in others. Domain specific editor plugins could later be added to cover specific needs (e.g. ignore whitespace nodes in the case of config data).

Also, do you see two potentially different types of the trees when dealing with pretty-formatting (as in JSON) as compared to text nodes spanning across multiple lines?

This depends on the domain. In somme cases it also depends on the parent node's nature (e.g. a domain-specific parser might handle whitespace differently in nodes foo and bar).

I guess these type of problems are solved through extension of the Basic PDML, e.g. by introducing specific operators to handle multi-line strings and their whitespace

Yes. Consider, for example, whitespace in a PML's code block:

[code
    """
    if the_sun_shines then
        write "Anton is happy!"
    .
    """
]

The first 4 whitespace characters in the write ... line are ignored, the subsequent 4 spaces are part of the source code. You can have a look at the Java source code to handle this here and here (look at method readBlockWithDelimiter).

tajmone commented 2 years ago

Versioning the PDML Spec

I don't move this thread to discussions, because the PDML docs need to be enhanced (as explained below).

I strongly suggest you start adding a versioning scheme to the PDML Spec, because if you don't and people start to create their own implementations or extensions, then you'd soon be facing chaos and compatibility issues, and there isn't really any going back at that point — especially if PDML has success, which would then become something like the "Markdown Saga", where lack of a well defined spec led to variants proliferation and compatibility issues.

I would suggest SemVer as the versioning scheme, but if you do adopt you'll have to stick severely to its dictates — no breaking changes within a same MAJOR version!

PML uses SemVer, but we've seen breaking syntax changes within the same MAJOR version at some point (when the parameter chevron/angular brackets were dropped in favour of another pair of delimiters, IIRC). This should never happen in a SemVer versioned tool, because it can break the whole ecosystem built around it. This rule is so strict that package managers can ban packages that violate it, because end users need to rely on the scheme for backward compatibility certainty.

Also, the PDML Specification lends itself well to SemVer, since we're dealing with a Spec, not a tool (where different factors are at play), so the semantics of SemVer should be clearly referring to the specification only (i.e. not the Java parser implementation, or any particular implementation).

For example, the TOML configuration format (a YAML like format, but simpler) SemVer worked really well, since it stuck to it from the very Alpha and Beta stages (v0.1.0 onward), which allowed third party implementers to create TOML libraries from the very beginning, which helped TOML popularity since by the time it reached v1.0.0 it was already wide spread on many languages — by that time, the bump from Beta to Stable was a mere formality, since the spec had been driven by real-life implementers' feedback.

Challenges of an Open Spec

if I've understood correctly, is a way to keep the standard open to different applications, according to domain-specific needs

Yes, exactly.

This open aspect of the PDML Spec can be challenging in many respects.

I suggest taking a clear separation when it comes to the actual Spec and what are (or could be) domain specific implementations. Any implementation specific choices could bar the options for other domains, so it's better to keep the Basic/"core" Spec as open as possible.

I'm not sure at this point what role the Java implementation plays in all of this, is it just a reference sample implementation, or is it THE reference implementation?

The statement is correct from a logical point of view when we look at the relevant content of the config data. But a basic PDML parser would indeed parse the whitespace between nodes as text containing whitespace.

However, a dedicated PDML parser for config files would certainly apply the "skip whitespace nodes" rule, so that the parsed AST would no more contain the whitespace between nodes.

It's hard to reconcile these two points. If the former statement is true for a basic PDML parser, then the latter implementation would be in violation of the Basic Spec.

You could either keep the Basic Spec totally implementation agnostic, and provide general guidelines for domain-specific implementations, i.e. clarifying that whitespace handling rules are delegated to each application specific needs, and not part of the Spec, or you could include in the Basic Spec derivative sub-Specs, for domain specific applications.

I'm not sure what would work best, but I'm pretty sure that anyone implementing a PDML parser would benefit from being able to adhere to official Specs, which will guarantee that the library/tool is conformant to its users' expectations.

In any case, since PDML is extensible by design, you'll have to tackle with the issue of how the original specs might branch accord to domain specific needs. Ignoring vs honouring whitespace parsing is a good example of how two different domains might adopt different stances on the Basic Spec — most likely, data driven application will ignore whitespace as noise, whereas markup oriented tools will honour it.

Specs, Examples, EBNF, et Co.

Yes. The EBNF grammar and the railroad diagrams are correct, it's just the text of the 'tree' example that needs to be corrected.

Maybe the PML Spec should become more of a terse technical document, where examples are reduced to the minimum required to illustrate the points being presented, and all practical usage examples should be moved into another document, a sort of User Guide, to avoid confusion.

It's inevitable that whenever you introduce a new concept, or provide an example, some user coming from a specific domain will see more to it than originally intended — e.g. because in his/her specific domain there are additional constraints, or more advanced categories at stake, etc.

When it comes to the Spec, there should be one and only one official reference, i.e. the PDML Specification — the EBNF grammar and the railroad diagrams should be just auxiliary assets to help users better contextualize the Spec, if these look more familiar to them. But IMO in no case should the EBNF or Railroads be correct over the Spec, otherwise it's chaos because some users might build their tools using different references — which means that ultimately you'll be burdened with the task of having to maintain three different official references, and ensure they always mirror each other with each PDML update, and without possible conflicts.

BNF grammars in particular are not a good and reliable reference in this case, as the history of language engineering as shown that rarely the BNF, EBNF and ABNF grammars have been able to represent real languages in their "pure forms", i.e. without tweaks and additions to allow handling edge cases (from meaningful whitespace in indentation-sensitive languages, to additional notations to handle operators associativity, etc.).

Furthermore, the current PDML EBNF grammar is not a valid EBNF document either, so it shouldn't be proposed as an official PDML reference over the PDML Spec document, IMO.

When you come to think about it, there's nothing that prevents you from including into the PDML Spec document side notes, proposals, and whatever asides might contribute to clarify specific points, even if these points are not yet embraced by the Spec but only being considered for future implementations.

You could either adopt an existing Specs model, e.g. based on how the various standards are being developed (ISO, etc.), or come up with your own model, described elsewhere. After all, this very Issue and its ongoing discussion is de facto already part of the decisional process that will shape the next update of the PDML Spec — except that it's an informal process, and we have no guidelines as to how and when the Spec will be updated.

Surely, we can't pretend we're an international organization like ISO or the W3C, where there's an elected (and possibly retributed) board of experts that regularly meets and deliberates on the Spec. But this doesn't mean that the process can't be formalized at all. Adoption of SemVer would be a first formal step. Establishing some criteria on what goes into the Basic Spec and what belongs to Extended Specs could be another. And so on.

What matters most in a Spec (any Spec) is that its users can rely on it. So much so that even if a bad decision is made it's preferable to stick and live with it, rather than rewriting history and breaking linear development — the history of computer standards and languages if full of examples of rushed choices which had long-term negative impact on world users, especially when it comes to USA-centric choices that make programming life a real pain for non-English users, from the adoption of symbols not available on non-US keyboards (tilde & friends), up to string management choices that leave little room for certain languages and/or alphabets.

I think that at this stage of the Spec, the rule that

Applications reading PDML documents (or customized PDML parsers) are free to implement any appropriate whitespace handling rules, [...]

is perfectly OK — as long as this is associated with a Spec version! so that any implementation can claim to be conformant to a given PDML Spec.

Obviously, Specs evolve in time, but as long as they are versioned, this should not be a problem — unless a Spec grows chaotically, in a manner that makes it hard to keep up with.

Right now, we're not even sure whether domain specific applications might desire whitespace to remain an open issue, and whether taking a stance on this might bar entire domains of application.

Extensibility also makes it conceivable that the Spec might branch in different directions, if these reflect real-life use cases; again, something that might be embodied in the official Spec if this is beneficial to domain specific implementers. In all cases, I think that the PDML Spec document should remain a single official document, to avoid confusion.

Right now, it's better to have different users create multiple editor plugins according to their own needs (all of them being compliant to the standard) rather than have end users violate the Spec in order to bent it to their practical needs.

Foreseeable Extensibility Needs

Looking at how PML has evolved, it's reasonable to deduce that most extensions will rely on similar semantics to fulfil common needs — such as introducing comments delimiters [-/-], preprocessor directives ([!), and so on.

The official Spec could embody some official guidelines when it comes to the notation for implementing these extensions. It would make more sense that all extensions use a same delimiter for comments (like PML's [-/-]) and preprocessor directives (like PML's [!) rather than seeing a proliferation of different arbitrary nodes.

The challenge is how to embody these in the Spec without adding confusion regarding to the Spec version number. I'm not sure what the solution might be, but it could be something like:

and so on. I.e. the Spec imposes some restrictions on the freedom of choice when it comes to adding new elements. This would allow the Spec to be inspired by real implementations in the wild, and e.g. adopt any good example as part of the official spec, in order to keep development on track and prevent dilution of efforts by excessive proliferation of conflicting arbitrary choices.

This would also mean that, if in real life we're faced with dozens of third party implementers of PDML parsers or extensions, coming from the Python world, who happen to agree on using certain conventions for implementing new specific features, then PDML might just as well make these majoritarian choices the official way of doing it, just in order to sustain a well trotted path found in the real world — as opposed to embracing "schools wars", where fans of a specific language would rather stick to their familiar conventions than follow the mainstream flux of the PDML community.

Sometimes the "tyranny of the majority" simply works, and it's better than leaving the door open to absurd flame wars in the name of "democracy" — unless you enjoy seeing people scrapping over "spaces vs tabs" for ever. If I was in the business of selling pop-corn and sodas to audiences, I could benefit economically from decisional meetings that turn into hooligans fights; but since I don't, I'd rather have a spec that takes a clear stance (albeit arbitrary) on such divides, favouring whatever most implementers chose to use, rather than going all philosophical about the eschatological subtleties of arbitrary choices. But then, again, "benevolent dictatorship" is more common in the computer world, as a development model than majority-driven decision making, and my guess is that it's not random but experienced based — if fighting over tabs vs spaces might have been somewhat acceptable three decades ago, now that some of the original contenders are getting of age, these fist-fights are becoming a somewhat embarrassing show (there's a limit even to the lengths that the worst entertainment-driven audiences are willing to go, and geriatric fist-fighting is one of them).

In any case, when it comes to any standard you have to live with the fact that every decision is going to make some people happy and others contempt — there is no way of getting around this. If this wasn't the case, we wouldn't have all these different standards, where many are just variations (offshoots, or "forks") of each other.

So, ultimately the choice is yours (your Spec, your vision).

Yes. Consider, for example, whitespace in a PML's code block:

Indeed, that's a good example since it touches on how block delimiters might be implemented. Does PDML propose a specific notation for this? or is it entirely upon the implementer?

It seems to me that the challenges here are very similar to those which PP had to face, i.e.:

E.g. in a PDML parser implementation I could arbitrarily chose to implement various string delimiters to resolve the whitespace problem:

etc., where each of these is an arbitrary choice, inspired by the way some language implements different string types. But I could also adopt different conventions, and e.g. emulate the way Ruby handles strings, which is quite elegant and covers many different use cases.

The problem is that these are all arbitrary choices, none is better than the rest, although depending on the domain of use some might make more sense than others. Should any of these be covered by the Spec? There are pros and cons in this.

If the Spec indicates an official way to implement string delimiters, the pros are that most implementations will be compatible with each other (which is useful when dealing with libraries), but the cons is that this would be a limitation for some users.

There should be a line separating what needs to be officially covered and what should be left open. I believe that some details of extensions deserve official definitions from the Spec, but these should not become limiting constraints.

One could argue that ignoring whitespace within nodes could be a better default than having parsers deal with it, since string delimiters can be introduced to preserved needed whitespace — a choice that probably makes life easier for those who wish to use PDML as a tool to handle key-values. But I'm sure there are arguments to the contrary.

None of the above means that the standard can't remain open. But some aspects of this openness need to be clarified, to avoid confusion regarding its use, and especially in respect to the direction PDML might be taking in its evolution. An important aspect is to be clear about which points are definitive and which are still open to change in the future.

celtic-coder commented 2 years ago

Hi Tristano (@tajmone),

That is a well thought-out and comprehensive piece of writing! Lots to think about going forward!

Kind Regards, Liam

pdml-lang commented 2 years ago

That is a well thought-out and comprehensive piece of writing! Lots to think about going forward!

YES !!!

I'll re-read it, add items to my to-do list, and keep you updated. Thank you, Tristano (@tajmone).

pdml-lang commented 2 years ago

@tajmone: Great suggestions. I agree with everything.

I strongly suggest you start adding a versioning scheme to the PDML Spec

It's done now.

I would suggest SemVer as the versioning scheme

Yes.

PML uses SemVer, but we've seen breaking syntax changes within the same MAJOR version at some point

You're right! I have to be more careful in the future, when I assign a new version.

the semantics of SemVer should be clearly referring to the specification only (i.e. not the Java parser implementation, or any particular implementation)

Yes.

I'm not sure at this point what role the Java implementation plays in all of this, is it just a reference sample implementation, or is it THE reference implementation?

I would call it "a work in progress that will eventually be the reference implementation for Extended PDML". Let me explain. I created the current Java implementation (CJI) to cover the needs for PML. However, CJI does not contain any code that is specific to PML. Everything related to PML is achieved by just configuring the CJI through its standard extensions mechanism. While I have published a specification for Basic/Core PDML, there is currently no specification yet for Extended PDML, covering comments, attributes, extension nodes, types, etc. I intend to do that after some work I want to finish in PML. I do not intend to later add any extension to basic PML, even not features that are frequently useful (e.g. comments, !ins-file, etc.). Basic PDML will always remain simple and just cover the "absolute minimum needed to store data". The CJI will support all features of the official Extended PDML Specification. I also intend to publish a separate reference implementation for Basic/Core PDML. That will be easy to do, because it will just be a much stripped down version of CJI. Having two separate reference implementations (one for basic PDML, and one for extended PDML) will be useful, because implementing extended PDML (covering all features) is MUCH more challenging than implementing basic PDML.

a dedicated PDML parser for config files would certainly apply the "skip whitespace nodes" rule ... It's hard to reconcile these two points.

I've updated the tree example to avoid the confusion.

Maybe the PML Spec should become more of a terse technical document, where examples are reduced to the minimum required to illustrate the points being presented, and all practical usage examples should be moved into another document, a sort of User Guide, to avoid confusion.

This is exactly what I tried to achieve. The spec. contains a minimum of very simple examples, while more practical usage examples are shown in Examples. I'm open for any suggestions to improve.

there should be one and only one official reference, i.e. the PDML Specification — the EBNF grammar and the railroad diagrams should be just auxiliary assets to help users better contextualize the Spec, if these look more familiar to them.

Yes! I changed the text and added remarks in the specification, as well as in the EBNF and railrowd diagrams to make this clear.

the current PDML EBNF grammar is not a valid EBNF document either

Could you please explain what you mean by "not a valid EBNF document"?

the history of computer standards and languages if full of examples of rushed choices which had long-term negative impact on world users

Oh yes!

The official Spec could embody some official guidelines when it comes to the notation for implementing these extensions. It would make more sense that all extensions use a same delimiter for comments (like PML's [-/-]) and preprocessor directives (like PML's [!) rather than seeing a proliferation of different arbitrary nodes.

I agree. These choices will be part of the official Extended PDML Specification.

Sometimes the "tyranny of the majority" simply works, and it's better than leaving the door open to absurd flame wars in the name of "democracy" — unless you enjoy seeing people scrapping over "spaces vs tabs" for ever.

True. The challenge is to find the right balance between rules and freedom of choice. At the beginning it's certainly better to err on the side of being too restrictive, and carefully loosen the rules later, based on the wishes of the majority. Otherwise we risk to frustrate users with backwards-incompatibility-nightmares and/or ourselves with maintenance nightmares.

If I was in the business of selling pop-corn and sodas to audiences, I could benefit economically from decisional meetings that turn into hooligans fights

LOL. Well said.

Yes. Consider, for example, whitespace in a PML's code block ... Indeed, that's a good example since it touches on how block delimiters might be implemented. Does PDML propose a specific notation for this? or is it entirely upon the implementer?

This will be specified in the Extended PDML Specification for a standard PDML type named text-block. Implementers are free to create other domain-specific types with different names and customized rules. Therefore there should be a naming convention (or rule) to avoid clashes of type names. A simple distinction could be a specified suffix (such as an underline) for domain-specific type names and extension nodes (e.g. foo is a standard PDML type, foo_ is a domain-specific type). A better approach to avoid name clashes would be to support namespaces (as in XML), but this might better be the subject of a dedicated discussion.

tajmone commented 2 years ago

PDML Spec License

License CC BY-SA 4.0

The lack of the ND clause means people can create derivative license texts while keeping the Basic PDML Specification title. I'm not sure whether this is what you want, or if you'd prefer that there's one and only one official Basic PDML Specification, and that derivative Specs adopt a different document naming.

You could achieve this by enforcing the ND clause in the Creative Commons license, but this would imply that no derivative works whatsoever would be allowed — i.e. the Spec could be ported to other formats, but its contents can't be edited, which includes: no translations, no typos fixing in a forked repository ... nothing that involves text editing.

So that might be too extreme a measure (especially not being able to translate it). Alternative solutions to this problem (which is also found in license texts, where often you see a clause that any changes to the license text itself require renaming the license to something else) might require an additional hand-written clause — but I'm not sure whether this can be done with Creative Common licenses (need to read the legal fine prints).

For example, the SIL Open Font License (OFL) includes the following clause:

3) No Modified Version of the Font Software may use the Reserved Font Name(s)
    unless explicit written permission is granted by the corresponding
   Copyright Holder. This restriction only applies to the primary font name
   as presented to the users.

This is one of the most common fonts licenses used, which still allows users to modify the fonts, but by doing so they are forced to adopt a new font name (which can't be just a slight change in the original name, e.g. like changing an 'o' with a '0', etc., but an entirely different name that doesn't contain the original font name).

This clause was added to prevent proliferation of modified fonts that would create confusion as to what users are actually installing — especially in cases where the derivative font is one of lesser quality, but also because the original font author would like that end users get identical results when using his/her original font (on screen, on printed material, etc.).

PDML Spec vs Java Implementation

I do not intend to later add any extension to basic PML, even not features that are frequently useful (e.g. comments, !ins-file, etc.). Basic PDML will always remain simple and just cover the "absolute minimum needed to store data".

Right. Makes sense.

EBNF

the current PDML EBNF grammar is not a valid EBNF document either

Could you please explain what you mean by "not a valid EBNF document"?

I mean that the PDML EBNF found on the website wouldn't pass EBNF grammar validation, since it doesn't follow the conventional EBNF notation as defined in ISO/IEC 14977.

It's not very easy to provide all the examples, since the online version is presented via a table, which breaks the syntax rules into cells, instead of a single text document, but here are some examples:

non_empty_node  =   "[" name separator child_node + "]"

strictly speaking should be (extra spaces added for clarity):

non_empty_node  =   "[" , name , separator , child_node + , "]" ;

Also, I'm not sure this rule conforms:

text_char = any Unicode character,
            except "[", "]", and "\"
          | "\["
          | "\]"
          | "\\"

where (to the best of my limited knowledge) "except" is usually expressed as -; also I couldn't find "any Unicode character" in the ISO/IEC 14977 spec (nor any other Unicode reference really, since it's from 1996).

I've tried to reconstruct the PDML EBNF into a single unified text file, without the side comments and examples (or with comments included within (*..)), but some parts of the grammar simply don't validate.

I know that BNF and EBNF grammars tend to diverge in practical uses cases (much more than ABNF, which is stricter), and many EBNF driven tools allow extra freedom of action to circumvent the various limitation of the strict classic EBNF rules, but the PDML EBNF seems to freely mix natural language and EBNF notation.

The reason I brought up the point was to emphasize the need to keep the EBNF grammar and RR-Diagrams separate from the official PDML Spec — if the grammar was to be considered officially sanctioned, then it would have to be presented as a plaintext EBNF document that passes any EBNF validation tool, so that it might actually be used with software tools, e.g. a lexer/parser generator that accept EBNF grammars.

pdml-lang commented 2 years ago

people can create derivative license texts while keeping the Basic PDML Specification title. I'm not sure whether this is what you want, or if you'd prefer that there's one and only one official Basic PDML Specification

There should be only one official "Basic PDML Specification", to avoid confusion, proliferation, and chaos. The ND clause is now used. Thanks a lot for pointing this out.

not being able to translate it

I added chapter License with the following clause: "Permission is granted to create verbatim translations of this specification into other human languages."

BTW: I also added chapter Versioning

I mean that the PDML EBNF found on the website wouldn't pass EBNF grammar validation

I see. I replaced EBNF by EBNF-like on the EBNF page.

Later the EBNF should be adapted to conform to the EBNF notation as defined in ISO/IEC 14977. And it should be available in a plain text file, so that ...

it might actually be used with software tools, e.g. a lexer/parser generator that accept EBNF grammars.