invisibleXML / ixml

Invisible XML

GNU General Public License v3.0

51 stars 7 forks source link

Simple Namespaces Proposal #66

Closed yamahito closed 2 years ago

yamahito commented 2 years ago

Principles/assumptions

Prefixed non-terminals will represent nodes in a declared namespace.
Un-prefixed non-terminals will represent nodes in no namespace.
There will be no default namespace declarations; all namespace declarations shall be associated with a prefix in the grammar.
Prefixes can be associated with at most one namespace
Namespace associations in the grammar will not change.
you never need to add anything to your grammar in order not to use namespaces; and you must add something to your grammar if you do want to use namespaces;
those who have no need of namespaces in their XML never need to consider the rules for using them in ixml.

Grammar Implementation

It makes sense for me for all namespaces to be declared at the beginning of a grammar; I suggest an optional prolog:

         ixml: s, prolog?, rule+RS, s.

The prolog may contain a number of namespace declarations (if we find other uses for a prolog later, those may be added here); prolog entries can be disambiguated from rules by the presence of a keyword ("namespace") and mandatory whitespace:

       prolog: nsdecl*RS
       nsdecl: "namespace", RS, @prefix, s, -["=:"], RS, @nshref, -".".
       nshref: -string

The namespace refers to a namespace value nshref, which we treat as a string, and a prefix. In order to allow colons in non-terminal names, we make the whitespace after a rule declaration = or : character mandatory:

         rule: (mark, s)?, name, s, -["=:"], RS, -alts, -".".

and, of course, we need to add optional prefixes to non-terminal names:

        @name: (prefix, ":")?, namestart, namefollower*.
      -prefix: namestart, namefollower*
   -namestart: ["_"; L].
-namefollower: namestart; ["-.·‿⁀"; Nd; Mn].

The above is a suggestion as a starting point: I am sure there are likely to be some errors or areas for further consideration by the group!

ndw commented 2 years ago

I have expressed ambivalence about namespaces in ixml, but Tom has made persuasive (to me) arguments about the reality that XML users will be surprised that an important XML technology ships with no namespace support. It would feel odd to bolt it on later, I don't think any serious proposal would do anything substantially different than what Tom proposes, and this proposal makes no demands on users who aren't interested in, don't need, or don't want namespaces.

I support adding this to 1.0. I don't think it's a burden on users, or implementors, or the specification.

Technically, I would like to propose one small, but significant amendment. I think the nsdecl nonterminal should use "namespace" not "xmlns" to identify the namespace. As proposed, the XML serialization of an ixml grammar would contain an <xmlns .../> element and, technically, element names beginning "xml" are reserved. Also, "namespace" is just a lot more user friendly and it's consistent with what other text-based formats, like the RELAX NG Compact Syntax use.

ndw commented 2 years ago

Also, in practice, like my proposal for a version declaration, this proposal also means that the dchar and schar rules have to have explicit - marks.

yamahito commented 2 years ago

Technically, I would like to propose one small, but significant amendment. I think the nsdecl nonterminal should use "namespace" not "xmlns" to identify the namespace. As proposed, the XML serialization of an ixml grammar would contain an <xmlns .../> element and, technically, element names beginning "xml" are reserved. Also, "namespace" is just a lot more user friendly and it's consistent with what other text-based formats, like the RELAX NG Compact Syntax use.

Thanks, Norm; I've made that change as suggested.

Also, in practice, like my proposal for a version declaration, this proposal also means that the dchar and schar rules have to have explicit - marks.

Noted and agreed.

ndw commented 2 years ago

I implemented a grammar that included the namespaces proposal and I made a few different choices. I don't think they're substantive, but in the interest of keeping everything in one place, here they are:

       nsdecl: -"namespace", S, prefix, s, -["=:"], s, uri, -"." .
      @prefix: -ncname.
         @uri: -string.

I've renamed nshref to uri. I'm tempted to suggest that we rename nsdecl to namespace, but for the moment, I've left that unchanged.

I took a slightly different approach to the name production:

        @name: (ncname, ':')?, ncname.
      -ncname: namestart, namefollower*.

It makes no actual difference, but I think it's a little easier to read.

The proposal says:

In order to allow colons in non-terminal names, we make the whitespace after a rule declaration = or : character mandatory:

I don't believe that's necessary, so I don't think we should do it. Unlike the situation we found ourselves in with ".", there's nothing ambiguous in the grammar about a:b:c:d.. That's a rule defining the nonterminal a:b that has c:d on its right hand side. There's no other possible interpretation.

cmsmcq commented 2 years ago

But the rule a:b:c. does appear to be ambiguous: is it a:b = c. or a = b:c. ?

ndw commented 2 years ago

Oh. Indeed. Well boo.

yamahito commented 2 years ago

Editing the description to use RS for required space rather than S in line with issue #62

spemberton commented 2 years ago

Three things:

I miss the rationale, or requirements, so that the design can be compared against what is stated as required. One of the stated requirements for ixml was that it wasn't necessary to produce any particular XML markup, since there are already tools for that. It was an initial step to get input into XML, which then could be further transformed.

Is this a change to that requirement? What is the user-need or use-case that this change addresses?

There is absolutely no need for the prologue part, since the following:

namespace ex: "http://example.org/NS". data: ex:this, ex:that, ex:theOther.

can just as easily be done using existing insertions, without adding new syntax:

data: @xmlns:ex, ex:this, ex:that, ex:theOther.
xmlns:ex: ^"http://example.org/NS".

With added benefits such as namespaces not being just global.

That reduces this issue to the request "allow ':' as a namefollower".

In which case I would propose the syntax

name: namestart, nametail**("."; ":").
nametail: namefollower+.

(and removing "." from namefollowers) which then ensures that a name can contain but not end with a colon or dot.

ndw commented 2 years ago

I feel like Tom has attempted to motivate the need for namespaces on several occasions. I'm not going to try to recapitulate them here.

Your "wild west" proposal to leave the entire question up to the user with insertions for namespace declarations and simply adding ":" to namefollower is...uhm...interesting. On the one hand, it would allow ixml to simply wash its hands of namespaces and say "it's the author's problem to get them right" and "it's the implementor's problem to produce namespace well-formed XML."

But I think we'd regret that in the long run. I can't even begin to predict the kinds of confusion that would cause for either authors or implementors. It would also mean that if we wanted to use namespaces in ixml in some principled way in the future, we'd have to add a declaration mechanism on top of the wild west that we'd have opened up. That just screams "backwards incompatibility" and "usability nightmare".

The proposal to allow users to declare them once, globally, lets both the user and the implementation manage the namespaces in a coherent manner. It may not satisfy every possible use case, but it certainly satisfies more than 80% of them, I expect.

spemberton commented 2 years ago

I feel like Tom has attempted to motivate the need for namespaces on several occasions. I'm not going to try to recapitulate them here.

Then I may have missed them, because all I remember is "Because I need them", or "For obvious reasons".

Your "wild west" proposal to leave the entire question up to the user with insertions for namespace declarations and simply adding ":" to namefollower is...uhm...interesting. On the one hand, it would allow ixml to simply wash its hands of namespaces and say "it's the author's problem to get them right" and "it's the implementor's problem to produce namespace well-formed XML."

I'm all for it.

But I think we'd regret that in the long run. I can't even begin to predict the kinds of confusion that would cause for either authors or implementors. It would also mean that if we wanted to use namespaces in ixml in some principled way in the future, we'd have to add a declaration mechanism on top of the wild west that we'd have opened up. That just screams "backwards incompatibility" and "usability nightmare".

All the more reason to leave namespaces out completely then.

Steven

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

ndw commented 2 years ago

The wild west solution is utterly untenable. Namespaces are not attributes and pretending that they are is just going to make us look silly. Or worse.

We may think namespaces were a bad idea. We may think that namespaces were poorly conceived in XML. We may wish we could make them go away or do them over in some better way, but we can't and any attempt to do so would be foolish.

I don't know of a single, modern XML vocabulary that doesn't use namespaces. I haven't written anything larger than a toy example that didn't have at least a default namespace in twenty or more years. I think Tom would be the first to say that developers who came to XML after we all got through inventing it are just used to namespaces, expect namespaces, and won't understand why we didn't implement them.

I think there are two ways forward:

Invisible XML doesn't do namespaces. This is the status quo. In this case, I think that we must explicitly forbid the construction of an attribute named "xmlns". That's just forbidden in modern usage. The "@" mark serializes a nonterminal as an attribute, "xmlns" is not an attribute in any modern data model.
Invisible XML adopts a proposal to do namespaces in some way that's consistent with how namespaces actually work in modern XML. The simple namespaces proposal seems to fit the bill, but I'm certainly open to other suggestions.

If we stick with the status quo, and if we're wrong about namespaces, if users really do need and want them, and if they're unable to use ixml without them, I observe that choice 1 will have consequences for interoperability. Every implementor will have to invent a non-standard, non-interoperable mechanism for providing namespace support.

yamahito commented 2 years ago

One of the stated requirements for ixml was that it wasn't necessary to produce any particular XML markup, since there are already tools for that. It was an initial step to get input into XML, which then could be further transformed. Although it isn't necessary to produce a particular XML markup, I hold that it also isn't necessary to forbid such a production, in the absence of a technical barrier.

iXML doesn't (and shouldn't) allow for (e.g.) arbitrary re-ordering of elements. It is therefore right that iXML does not guarantee a given output. It is right that iXML isn't repurposed as some tree-to-tree transformation language.

However, I don't think that is an argument for deliberately forbidding a particular XML markup from being produced, if there are no technical limitations for doing so. Excluding namespaces would be one way of making the lives of our users more difficult, and I struggle to see the reason for it.

If you insist that iXML must have incomplete namespace support because of the philosophical ideal that the only thing it must do would be to create XML for post processing, then I must reiterate the argument that in that case, @ and - marks must also be removed for the same reason.

Then I may have missed them, because all I remember is "Because I need them", or "For obvious reasons". I deliberately, explicitly wanted to keep this a technical 'how to do this', separate from a 'why we should do this' because I feel we've been around the issue so many times. Here are some reasons, there may be others:

User experience; I think we should be asking for what the requirements are to restrict what users can reasonably expect to do (which IMO and many of theirs includes namespaced elements) rather than what the requirements are that justifies meeting their expectations.
Namespace allocation belongs in the grammar: it describes the semantic identification of the representation. Consider examples such as xml:id, xml:lang, xlink, rdfa, etc etc etc
It allows recognition of mixed mark-up in the produced XML. That is useful for post processing and shared code.
It allows mixing namespaced production rules in grammars. I think that's a potentially interesting way to do language detection, and would be worth exploring.
If we don't introduce a namespace mechanism, it will be done unofficially, and we will lose control of interoperability and standardisation.

I invite other members of the group to add any reasons I may have missed.

I believe that the decision to exclude namespaces ought to be considered on the same basis as the decision to include them: please could you give similar reasons and rationalisations as to why we should exclude namespaces from iXML?

Thanks, Tom

Tomos Hillman eXpertML Ltd +44 7793 242058 On 18 Apr 2022, 15:36 +0100, invisibleXML/ixml @.***>, wrote:

One of the stated requirements for ixml was that it wasn't necessary to produce any particular XML markup, since there are already tools for that. It was an initial step to get input into XML, which then could be further transformed.

LinguaCelta commented 2 years ago

I’ve flip-flopped about on this issue, but I’ve finally come to agree with Tom on the basic point that we should include namespace support.

I have Thoughts and Opinions about the theoretical status of namespaces in ixml, but I don’t think that my theoretical ideals outweigh practicalities. Namespaces are an integral part of many people’s use of XML. Like them or not, agree with how they were implemented or not, they’re there and they’re essential for some use cases.

I think that Tom’s reasons 1) and 5) are the most compelling: users who are used to XML will likely be confused, annoyed, and (ultimately) put off if there’s no namespace support. Enterprising users and implementors will probably decide to kludge something together. There’ll be anarchy. Human sacrifice. Dogs and cats living together. MASS HYSTERIA.

Okay, maybe not quite. But there will be users who give up on ixml because it apparently can’t support one of the central features of XML.

My view is this:

It would not be technically difficult to support namespaces. Supporting namespaces would introduce no new complexity to users who don’t need or want to use namespaces. Supporting namespaces would add value to users for whom namespaces are a preferred or required part of their workflow. Supporting namespaces would allow us to control a standardized mechanism for them, as well as establishing useful infrastructure for any future additions of this sort (thus, I idealistically imagine, discouraging kludging). Supporting namespaces would allow us to demonstrate commitment to user experience and usability. I believe this is one of the most important things we can demonstrate when we first introduce ixml to a broad user base.

I therefore think we should add namespace support to ixml.

BTW

Dr. Bethan Tovey-Walsh Myfyrwraig PhD | PhD Student CorCenCC http://www.corcencc.org/ Prifysgol Abertawe | Swansea University LinkedIn https://www.linkedin.com/in/linguacelta Croeso i chi ysgrifennu ataf yn y Gymraeg.

On 19 Apr 2022, at 12:05, Tomos Hillman @.***> wrote:

One of the stated requirements for ixml was that it wasn't necessary to produce any particular XML markup, since there are already tools for that. It was an initial step to get input into XML, which then could be further transformed. Although it isn't necessary to produce a particular XML markup, I hold that it also isn't necessary to forbid such a production, in the absence of a technical barrier.

iXML doesn't (and shouldn't) allow for (e.g.) arbitrary re-ordering of elements. It is therefore right that iXML does not guarantee a given output. It is right that iXML isn't repurposed as some tree-to-tree transformation language.

However, I don't think that is an argument for deliberately forbidding a particular XML markup from being produced, if there are no technical limitations for doing so. Excluding namespaces would be one way of making the lives of our users more difficult, and I struggle to see the reason for it.

If you insist that iXML must have incomplete namespace support because of the philosophical ideal that the only thing it must do would be to create XML for post processing, then I must reiterate the argument that in that case, @ and - marks must also be removed for the same reason.

Then I may have missed them, because all I remember is "Because I need them", or "For obvious reasons". I deliberately, explicitly wanted to keep this a technical 'how to do this', separate from a 'why we should do this' because I feel we've been around the issue so many times. Here are some reasons, there may be others:

User experience; I think we should be asking for what the requirements are to restrict what users can reasonably expect to do (which IMO and many of theirs includes namespaced elements) rather than what the requirements are that justifies meeting their expectations.

Namespace allocation belongs in the grammar: it describes the semantic identification of the representation. Consider examples such as xml:id, xml:lang, xlink, rdfa, etc etc etc

It allows recognition of mixed mark-up in the produced XML. That is useful for post processing and shared code.

It allows mixing namespaced production rules in grammars. I think that's a potentially interesting way to do language detection, and would be worth exploring.

If we don't introduce a namespace mechanism, it will be done unofficially, and we will lose control of interoperability and standardisation.

I invite other members of the group to add any reasons I may have missed.

I believe that the decision to exclude namespaces ought to be considered on the same basis as the decision to include them: please could you give similar reasons and rationalisations as to why we should exclude namespaces from iXML?

Thanks, Tom

Tomos Hillman eXpertML Ltd +44 7793 242058 On 18 Apr 2022, 15:36 +0100, invisibleXML/ixml @.***>, wrote:

One of the stated requirements for ixml was that it wasn't necessary to produce any particular XML markup, since there are already tools for that. It was an initial step to get input into XML, which then could be further transformed. — Reply to this email directly, view it on GitHub https://github.com/invisibleXML/ixml/issues/66#issuecomment-1102510382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJWAGZGBKV5I47QYMTMYILVF2HRFANCNFSM5S4IQL2Q. You are receiving this because you are subscribed to this thread.

ndw commented 2 years ago

I hadn't previously considered xml:id or xml:lang or rdf:about or xlink:href, etc. Thank you, Tom. I think that's another mark in the "very desirable in 1.0" column.

There are a couple of nice features of this proposal, IMHO.

You get exactly one set of global declarations at the top of the grammar. There's no attempt being made to manage namespace declarations, redeclarations, or undeclarations. It's the simplest possible thing that can work.
Using namespaces is completely optional. If you only need to produce XML without namespaces, you simply don't use the namespace declarations or colons in nonterminal names. All it adds is enough syntax to make it possible to generate names in namespaces and semantics, that when you do, that are consistent with modern XML practice.

It does seem slightly hard to explain why I can write a grammar:

variablelist = varlistentry+ .
varlistentry = id, term, listitem .
@id = ~['/']+, -'/' .
term = ~[#d | #a]+, nl .
listitem = para .
para = -#9, ~[#d | #a]+, nl .
-nl = -#d?, -#a .

that will turn

paris/Paris
    The capitol of France.
london/London
    The capitol of the United Kingdom.
prague/Prague
    The capitol of Czechia.

into

<variablelist>
   <varlistentry id="paris">
      <term>Paris</term>
      <listitem>
         <para>The capitol of France.</para>
      </listitem>
   </varlistentry>
   <varlistentry id="london">
      <term>London</term>
      <listitem>
         <para>The capitol of the United Kingdom.</para>
      </listitem>
   </varlistentry>
   <varlistentry id="prague">
      <term>Prague</term>
      <listitem>
         <para>The capitol of Czechia.</para>
      </listitem>
   </varlistentry>
</variablelist>

but not

<variablelist xmlns='http://docbook.org/ns/docbook'>
   <varlistentry xml:id="paris">
      <term>Paris</term>
      <listitem>
         <para>The capitol of France.</para>
      </listitem>
   </varlistentry>
   <varlistentry xml:id="london">
      <term>London</term>
      <listitem>
         <para>The capitol of the United Kingdom.</para>
      </listitem>
   </varlistentry>
   <varlistentry xml:id="prague">
      <term>Prague</term>
      <listitem>
         <para>The capitol of Czechia.</para>
      </listitem>
   </varlistentry>
</variablelist>

when the changes necessary to the grammar are both easy to use and easy to understand:

default namespace = "http://docbook.org/ns/docbook" .

variablelist = varlistentry+ .
varlistentry = xml:id, term, listitem .
@xml:id = ~['/']+, -'/' .
term = ~[#d | #a]+, nl .
listitem = para .
para = -#9, ~[#d | #a]+, nl .
-nl = -#d?, -#a .

It's hard for me to imagine that anyone familiar with XML is going to find the namespaced version profoundly more difficult to understand.

As Tom says, there's no question that we want to avoid turning ixml into some kind of transformation language, but using namespaces doesn't feel transformational to me, it just feels like using the real names for things. And telling users they have to run a transformation to add the namespace and rename id to xml:id is certainly going to make some users look at us like we're mad.

cmsmcq commented 2 years ago

Steven Pemberton writes:

Three things:

I miss the rationale, or requirements, so that the design can be compared against what is stated as required.

One of the stated requirements for ixml was that it wasn't necessary to produce any particular XML markup, since there are already tools for that. It was an initial step to get input into XML, which then could be further transformed.

Is this a change to that requirement? What is the user-need or use-case that this change addresses?

If the argument is that we should not support namespaces because users can get data into XML without namespaces, and for no other reason, then I look forward to eliminating all the complexity of marking nonterminals and terminals, since it is equally true that users can get data into XML without using attributes and without hiding any nonterminals.

The last time I looked, the rules for markings were motivated by the observation that they make it convenient to get XML that one finds more intuitive, more attractive, or more useful. I am not in a position to offer quotations, so my memory may have betrayed me. I have always interpreted ixml's facilities for markings as a simple application of a quite general rule: if you can make a spec better (more convenient, more powerful, easier to use in practice, ...) at low cost, then you weigh the pluses and the minuses, and "that's not a requirement" is at most a reason to take seriously the possibility of not doing whatever "it" is.

I think that implicitly there is a requirement that ixml take XML seriously and fit in with how XML is used. Otherwise, it's a toy.

That requirement entails that ixml support namespaces or have a really good story about why it doesn't. "You don't need it" is not a good enough story, because the answer from any user is likely to be "sez you!".

There is absolutely no need for the prologue part, since the following:

namespace ex: "http://example.org/NS". data: ex:this, ex:that, ex:theOther.

can just as easily be done using existing insertions, without adding new syntax:

data: @xmlns:ex, ex:this, ex:that, ex:theOther. xmlns:ex: ^"http://example.org/NS".

Need?

Please give the other members of the group credit for knowing something about how to write grammars, will you?

My reason for supporting the introduction of a prolog is not that I don't know how to do things without it. It's because I see it as an obvious improvement; if you don't, I am surprised at your blind spot.

With added benefits such as namespaces not being just global.

In what sense is that a benefit?

That reduces this issue to the request "allow ':' as a namefollower".

In which case I would propose the syntax

name: namestart, nametail**("."; ":"). nametail: namefollower+.

(and removing "." from namefollowers) which then ensures that a name can contain but not end with a colon or dot.

You must be joking.

-- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com

PieterLamers commented 2 years ago

Interesting discussions! I was wondering, what happens to the interchangeability of JSON and XML once you admit namespaces? Is there an output to JSON possible then? I remember someone saying that the name ixml was poorly chosen in that it is not so much about xml as it is about converting information into machine-readable data. I am not sure I want to burn my hands chosing sides in the discussion at hand: I respect you all too much. I am sensitive to the argument of keeping things simple (and staying out of the ns waters), but then again: at Declarative Amsterdam, Erik Siegel asked why a new RegExp syntax was introduced, rather than using the one available in XPDL, and Steven answered that the ixml one is more powerful. If potential adoption is relevant, it may be desirable to leave things out of a grammar, or, as Tom argues, to put them in. Good luck!

ndw commented 2 years ago

My analysis runs along these lines:

An XML technology that doesn’t support namespaces looks funny from the outset. Invariably, users will want to know how they make namespaces work, even if they don’t need them; they’re just part of the XML ecosystem. An ixml processor isn’t required to be able to produce JSON, but I think in practice, it will almost always be just fine.

IMHO, an hypothesis: most, let’s say 80% just so we have a number, grammars won’t use namespaces at all. 99% of the grammars that do use namespaces will use a single default namespace. So 99.8% of all grammars will convert to JSON in exactly the same way whether or not ixml supports namespaces. It is true that the remaining 0.2% might not have an obvious, direct mapping, but to be fair, they’re already using namespaces in some explicit way so why would one expect a JSON conversion to work automatically?

For the 19.8% of grammars that just need a single default namespace declaration, it seems disingenous to say that they have to put the step in an XProc pipeline or write a (slightly tricky) XSLT transformation just to add a namespace when namespaces are a feature of all modern XML vocabularies.

At this point, I think, “okay, we don’t need namespaces, but if we don’t support this particular, common XML feature, ~20% of our users are going to be at least a little bit inconvenienced”. Most of them could just get by with some kind of processor option that puts the output in a default namespace. Except there’s no interoperability in that solution and there’s no interoperable mechanism to allow the grammar to identify that namespace.

So how hard would it be to provide some simple namespace support: a single set of global declarations? We have no use case that suggests we need any of the complexities of undeclaration or redeclaration; effectively, we just need a way to define a set of global prefixes on the root element.

It isn’t hard at all. It’s a very small change to the grammar.
If you don’t need to use namespaces, there are exactly zero changes to the ixml grammar, the XML serialization of that grammar, and the XML serialization of the result.

So we should add them to ixml.

(I’m completely confused by the remark about XPDL and RegExp syntax. There aren’t any regular expressions in Invisible XML. There are a couple of constructs that allow repetition and there’s a character class that’s a little bit like a subset of a regular expression syntax, but neither of those look like things you’d want to replace with a regular expression syntax of any kind.)

ndw commented 2 years ago

The CG failed to achieve consensus, namespace support will not be available in ixml 1.0.