qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Concise syntax for map construction #1070

Closed michaelhkay closed 6 months ago

michaelhkay commented 6 months ago

It has been suggested that we should allow a "bare braces" syntax for map construction. This would reduce visual clutter especially when defining options arguments, as in

serialize($result, map{"method": "adaptive", "indent": true()})

I believe there are no syntactic obstacles to dropping the "map" keyword. The main reason it is there was because there was competition for the construct with people doing so-called scripting extensions who wanted "bare braces" to represent blocks of statements.

Allowing {"method": "adaptive"} would align with JSON.

But I think we should go a step further and drop the quotes:

{method: "adaptive"}

except that we could allow a string literal if the key isn't an NCName, as with record type syntax.

Could we do this and still allow computed or non-string keys? I don't think we need to, the existing syntax remains available.

So I propose we allow:

serialize($result, {method: "adaptive", indent: true()})

While we're about it, is there any enthusiasm for allowing

serialize($result, {method: "adaptive", indent: ✅}) (: U+2705 :)

ChristianGruen commented 6 months ago

I believe there are no syntactic obstacles to dropping the "map" keyword.

…would be great.

{method: "adaptive"}

My preference would be to consider map {} and {} as completely interchangeable. It’s a nice property of the existing constructors to be fully composable – all of [ x ], array { x } and map { x: y } are currently legal – and I think we shouldn’t drop it for the sake of brevity. Next, it could lead to confusion if $xml ! { x: y } created a key x with the y child elements of $xml.

If we aim for a more compact map syntax, I would rather suggest reviving @line-o’s proposal in #147 and parsing variable references as key-value combinations:

let $method := 'adaptive'
(: eq: serialize($node, map { 'method': $method }) :)
return serialize($node, { $method }) 

While we're about it, is there any enthusiasm for allowing serialize($result, {method: "adaptive", indent: ✅}) (: U+2705 :)

Why not this?… ;)

serialize($result, { indent: ✓ }) (: U+2713 :)

I believe I would be enthusiastic about literals for true() and false(), but ideally limited to the ASCII range, as long as developers are left who don’t exclusively code with Copy’n’Paste, StackOverflow and ChatGPT. For non-nerds, it can already be challenging to type in backslashes and curly braces on German keyboards (but that’s the old question of who our target audience will be).

benibela commented 6 months ago

My preference would be to consider map {} and {} as completely interchangeable. It’s a nice property of the existing constructors to be fully composable – all of [ x ], array { x } and map { x: y } are currently legal – and I think we shouldn’t drop it for the sake of brevity.

makes sense

I believe I would be enthusiastic about literals for true() and false(),

JSONiq had true meaning XPath's true(), and ./true meaning Xpath's true

ChristianGruen commented 6 months ago

JSONiq had true meaning XPath's true(), and ./true meaning Xpath's true

Good to know, thanks. An unused 4.0 syntax could be .true and .false (or .TRUE and .FALSE).

MarkNicholls commented 6 months ago

Allowing {"method": "adaptive"} would align with JSON.

But I think we should go a step further and drop the quotes:

{method: "adaptive"}

one of the USPs of XSLT (and xml) and nice things about python (and JSON) is you can paste an example output and work backwards with minimal syntactic massaging.

So whilst I instinctively prefer dropping the quotes from the key, I think it is more painful in practice, being able to paste JSON is a very attractive feature.

line-o commented 6 months ago

@MarkNicholls you would certainly be able to still use quotes for map keys.

I am actually concerned how we can decide if a key is a QName or "just" a string key.

Given the input {a :1} is a "a" or ./a ?

line-o commented 6 months ago

The same applies to values without quotes: {1: a} could be either "a" or ./a

rhdunn commented 6 months ago

See the note in https://www.w3.org/TR/xquery-31/#id-map-constructors for interpreting QNames as key names. The main thing to observe is that QNames don't have spaces and EBNF applies longest match so {a:b:c} is { a:b : c }.

joewiz commented 6 months ago

Would this cause any trouble when using enclosed expressions like <x>{1}</x> or <x>{"a"}</x>? Or would parsers be fine and have no problem distinguishing between these "bare braced map constructors" and enclosed expressions, e.g., <x>{{1: "a"}?1}</x>?

ChristianGruen commented 6 months ago

Would this cause any trouble when using enclosed expressions like <x>{1}</x> or <x>{"a"}</x>? Or would parsers be fine and have no problem distinguishing between these "bare braced map constructors" and enclosed expressions, e.g., <x>{{1: "a"}?1}</x>?

@joewiz As far as I can judge, the grammar will be unambiguous, as there are no cases in which either a map constructor or an enclosed expression is expected. The example you mentioned…

<x>{{1: "a"}?1}</x>

…is a good one, though: It will lead a syntax error, as the two opening curly braces result in a single brace (via the CommonContent rule), and the subsequent closing single curly brace is rejected. Instead, you’ll need to add whitespace after the first curly brace:

<x>{ {1: "a"}?1 }</x>

A note in the preview of Michael’s PR refers to this special case:

In some contexts it may be necessary to separate two adjacent left brace ({) or right brace (}) characters with whitespace to avoid the doubled brace being interpreted as an escaped single brace. This situation rarely arises in practice.

rhdunn commented 6 months ago

Stand-alone block expressions were only previously allowed in scripting expressions, e.g. while ($x gt 1) { ... }.

This has since been exteded in XPath 4.0 to if/else expressions, e.g. if ($x gt 1) { ... }.

This means that if ($x gt 1) { value: $x } would be an error -- likely something like "QName value: missing a local name" due to the other parts being valid up to that point.

This also means that if ($x gt 1) then { value: $x } else () would be valid, but only due to it being the simplified map.

rhdunn commented 6 months ago

I wonder if it is possible to unify the use of a block expression and simplified map construction. I.e. allowing both anywhere a single expression can be. For if/else, the simplified form could require a BracedExprSingle which would be:

BracedSingleExpr ::= BlockExpr | SimplifiedMapExpr

This could work as I don't think (StringLiteral | EQName) ":" is a valid start of an ExprSingle as : in that context is reserved for EQNames.

The map constructor note around EQNames and :s would apply more generally to braced constructs that are possibly maps. That is {a:b} is defining a path expression over a:b (as it would match a single expression as an EQName, not a map). Likewise, {a:b:c} would be a single element map entry with key a:b and value the result of the path expression c. -- Overall, I think thisn't too bad.

ChristianGruen commented 6 months ago

I wonder if it is possible to unify the use of a block expression and simplified map construction. I.e. allowing both anywhere a single expression can be. For if/else, the simplified form could require a BracedExprSingle which would be:

@rhdunn I’m not sure where to find the BlockExpr. Has it already been added somewhere in the spec or in a PR?

Currently, the following syntax is supported:

declare function local:f() {};

We’ll need to ensure that it won’t construct an empty map. But I assume your suggestion rather refers to EQNames?

rhdunn commented 6 months ago

And as MapKeyExpr is an ExprSingle, the grammar for Expr and MapConstructorEntry combine nicely:

Expr ::= ExprSingle ( "," ExprSingle )*
MapConstructorEntry ::= MapKeyExpr ":" MapValueExpr
MapKeyExpr ::= ExprSingle
MapValueExpr ::= ExprSingle

So enclosed expressions and simplified map expressions are compatable at the grammar level:

EnclosedExpr ::= "{" Expr? "}"
SimplifiedMapContructor ::= "{" (MapConstructorEntry ( "," MapConstructorEntry)* )? "}"

There is a case where these are ambiguous, which is for {} -- enclosed expression results in an empty sequence whereas the map constructor results in an empty map. I'm not sure specifically how to resolve this.

rhdunn commented 6 months ago

@ChristianGruen I was mixing up the scripting extension naming and the XPath/XQuery naming. I should have said EnclosedExpr as above.

rhdunn commented 6 months ago

A possible solution would be to forbid/disallow empty maps for the simplified map expressions. I.e.:

SimplifiedMapContructor ::= "{" MapConstructorEntry ( "," MapConstructorEntry)* "}"

That way {} is an enclosed expression resulting in an empty sequence, and if you want an empty map you can always use map {}.

I think this is the best compromise as you are most often going to use the simplified syntax with key/value pairs.

rhdunn commented 6 months ago

I think it is possible to at least get this part in and then have a separate issue/investigation into whether we can get {} working for creating empty maps.

ChristianGruen commented 6 months ago

There is a case where these are ambiguous, which is for {} -- enclosed expression results in an empty sequence whereas the map constructor results in an empty map. I'm not sure specifically how to resolve this.

As I’m too lazy to investigate ;)… Do you have insight into which enclosed expressions don’t result in an empty sequence?

Maybe it would be better to unify those and return an empty sequence, too. – In general, I like compact syntax a lot, but it could possibly lead to confusion in practice, in particular for classical XML users who don’t work with maps and arrays at all.

rhdunn commented 6 months ago

I already thought of that -- https://www.w3.org/Bugs/Public/show_bug.cgi?id=29989 :D.

I've noticed that SquareArrayConstructor can be simplified to "[" Expr? "]". I'm not sure why it is defined in terms of ExprSingles.

Searching for exact matches of "{" Expr "}", I find the following:

  1. CompElemConstructor
  2. CompAttrConstructor
  3. CompPIConstructor
  4. ValidateExpr

For the computed element constructors, these are the names; as empty names are invalid, this is correct. For validate expressions, it doesn't make sense to validate nothing.

Searching for exact matches of "{" Expr? "}", I find the following:

  1. EnclosedExpr
  2. ExtensionExpr

So ExtensionExpr can be simplified.

michaelhkay commented 6 months ago

The terminology EnclosedExpr is unfortunate because it is not actually an expression.

I think we can split the potential problems into three categories:

(a) formal syntax ambiguities

I think these only occur with doubled braces, for example A{{a:3}} or <a>{{a:3}}</a>, which are both resolved by the "longest token rule" - so the {{ is read as an escaped {. I believe no normative changes are needed to the tokenization rules in A.3, though some additional notes and examples might be helpful. Note that this problem is very unlikely to arise in practice because there is little reason to use a map constructor within string templates or element content; but if for some reason you want to write <a>{ {'a':3}?a }</a> then you need either to put whitespace between the two left curlies, or to use the "map" keyword.

I don't think that the empty case ('{}`) is any different from other cases where the parser is looking for an enclosed expression.

It's possible that there are other cases where a { at the start of an expression causes an ambiguity. I have only looked at it manually, I haven't applied any tools to the analysis. Perhaps the analysis done by @johnlumley for issue #1050 might reveal them. Remember that it took about five years before anyone spotted that /*/* was ambiguous in XPath 1.0.

(b) usability issues

These arise when a user uses (or starts) a map constructor accidentally, when something else was intended. The parser might go down the wrong avenue thinking it is parsing a map constructor, and ends up producing poor diagnostics when it turns out not to be.

An example might be writing

if ($condition) then {fpml:invoice} else {fpml:purchase-order}

The parser is going to treat the two braced expressions as map constructors, and the diagnostics might therefore be confusing. In some cases (I can't think of any) the user might accidentally construct an expression which is syntactically valid but is completely different from what was intended.

Conversely a user might write

if ($condition) {'a':1, 'b':2 }

Here a map constructor was probably intended, but it will not be parsed as one, because after reading the closing ')' the parser is not looking for an expression. The parser here will stumble when it hits the ':', at which point it might be difficult to give good diagnostics.

(c) conflict with XQuery extensions

It is possible that some implementors might want to provide an extended version of XQuery in which "bare braces" denote something else. @RHDunn has already referred to "block expressions" which were proposed at one time in scripting extensions. Clearly such extensions are incompatible with this change.

rhdunn commented 6 months ago

Yes, my process above is working through seeing if we can make if ($condition) {x: 1} valid. I.e. combining the enclosed expression and simplified map expression syntax.

The result of that is that it is possible provided that {} is an empty value not an empty map.

ChristianGruen commented 6 months ago

Yes, my process above is working through seeing if we can make if ($condition) {x: 1} valid. I.e. combining the enclosed expression and simplified map expression syntax.

If we supported this, people might try to do things like if ($condition) { x: 1 }[?x], which would be illegal, whereas { x: 1 }[?x] is possible… Frankly, my recommendation would be not to merge the syntax, but I’ll be interested in others’ opinions.

michaelhkay commented 6 months ago

Yes, my process above is working through seeing if we can make if ($condition) {x: 1} valid. I.e. combining the enclosed expression and simplified map expression syntax.

I think that would be terribly confusing.

rhdunn commented 6 months ago

Fair enough.