Serialization fallback.

michaelhkay commented 1 year ago

I propose that we drop some serialization errors in favour of producing a fallback representation of the supplied value.

The rationale is that (a) serialization is often used in contexts like xsl:message where the primary purpose is diagnostic, and the last thing you want when producing diagnostics is a secondary error; and (b) seeing a fallback representation of an inappropriate value often shows you much more clearly what you have done wrong than any error message can do.

Compare with the .toString() method in Java and similar languages, which always outputs something even if it's not quite what you wanted.

I'm not proposing to change the principle that the output should always be syntactically valid (e.g. well formed XML or JSON).

I think some of the specific error conditions we might drop are:

In sequence normalization rule 7, instead of raising an error when an attribute, namespace, or function (including a map or array) is encountered, serialize that item using the adaptive output method, treat the result as a text node, and insert the text node into sequence S6.
In the JSON output method: when a sequence of two or more items is encountered, instead of raising SERE0023, treat it as an array containing those items.

Closely related, and perhaps best considered together: should the fn:string() function accept anything as input, and never raise an error?

ChristianGruen commented 1 year ago

Maybe we can also drop SERE0020…

It is an error if a numeric value being serialized using the JSON output method cannot be represented in the JSON grammar (e.g. +INF, -INF, NaN).

…and output the values as strings (I wonder if it should be INF or Infinity?)

line-o commented 1 year ago

About serialising +INF, -INF and NaN: As we are not the first ones to discuss this there are several options

the standards way: Do not throw on special xs:float values, but serialize as null see RFC4627 and ECMA-262 (section 24.5.2, JSON.stringify, NOTE 4, page 683 of the ECMA-262 pdf at last edit):

Finite numbers are stringified as if by calling ToString(number). NaN and Infinity regardless of sign are represented as the String null.
the stringly way: As @ChristianGruen suggested turn them into strings (as JSON is JavaScript Object Notation) I would be in favour of "Infinity". Problem: Also has to dealt with when parsing JSON, is non-standard and application specific. What if we encounter {"manufacturer": "Infinity"} with no context or schema to know that this is a literal String not to be turned in to a special xs:float?
the hacky way: serialize negative and positive infinity as floating point numbers with an enourmously large exponent
```
JSON.parse('{"number": -1e333}') -> { number: -Infinity }
JSON.parse('{"number": 1e333}') -> { number: Infinity }
```
Problem: That works for JSON.parse JavaScript but might fail in other parsers. We would likely have to specify this ourselves.

michaelhkay commented 10 months ago

This issue overlaps with issue #576

ndw commented 3 months ago

Consensus in Prague was that an extra serialization option to turn on the look-ahead behavior might be worth doing.

ndw commented 3 months ago

Also in Prague, we should go the standards route and represent NaN and the flavors of infinity with null.

ChristianGruen commented 3 months ago

Consensus in Prague was that an extra serialization option to turn on the look-ahead behavior might be worth doing.

What kind of look-ahead would be performed with that option?

ChristianGruen commented 3 months ago

An addendum: Thanks for the joint discussion in yesterday’s meeting. I haven’t understood all details due to audio issues, but I’d like to add that there are many existing cases in which the serialization of item sequences with the default xml method creates output that cannot be properly parsed back to the original representation. Examples:

(: multiple text nodes; results in `AC` :)
serialize(<a>A<b/>C</a>//text())

(: arbitrary items; results in `1 2 3` :)
serialize(1 to 3)

The same currently applies to all other serialization methods (html, text, etc.) except for json. I cannot see a good reason why we should pursue a different path for serializing JSON.

For the sake of completeness, this is what a newline item separator returns when two maps are serialized:

serialize(
  ({ "A": 1 }, { "B": 2 }),
  { 'item-separator': '&#xa;', 'method': 'adaptive' }
)

(: result :)
{"A":1}
{"B":2}

qt4cg / qtspecs

Serialization fallback. #641