yaml / yaml-spec

YAML Specification
http://yaml.org/spec/
347 stars 53 forks source link

Formatted Content as a serialization detail. Isn't it a presentation detail? #322

Open lucabalsanelli opened 9 months ago

lucabalsanelli commented 9 months ago

I'm puzzled for the use of the words "formatted content" thoughout chapter 3 of the specs.

Some sentences make me dubious about at which stage scalar content should become a formatted content.

My understanding is that the unique constraint on the key nodes of a mapping at representation stage requires that there is a notion of equality between scalar content. Hence a tag must establish a canonical form of any formatted content.

Also, the processor ("represent" process) is in charge of mapping native data into scalars formatted in a canonical form. This form is the same which formatted content at presentation can be reversed to (this is the meaning I give to the sentence

"This form is a Unicode character string which also presents the same content…." Ref. sec. "3.2.1.3. Node Comparison"

).

In sec. "3.2.1.2. Tags" the words "formatted content" are used for the first time.

In sec. "3.2.3.2. Scalar Formats", it clearly states that the format is a presentation detail. (Figure "3.1. Processing Overview" refers to "formatted string values" in the Presentation box, which should refer to the same thing.)

"Like node style, the format is a presentation detail and is not reflected in the serialization tree and representation graph." Ref. "3.2.3.2 Scalar Formats"

However figures "Figure 3.2. Information Models", "Figure 3.4. Serialization Model" and Figure 3.5. Presentation Model" depict the words +Formatted Content (the plus indicating that it is a serialization detail).

I may be completely wrong, but I would expect it to be "Canonical Form/++Formatted Content". On the one hand, the "serialize" process should not introduce presentation details such as how to format scalar content (so I expect it to be "Canonical Form"; it is the "present" process in charge of doing so). On the other hand, the "parse" process should not convert formatted content into a Canonical Form (so I expect it to be "++Formatted Content"; it is the "compose" process in charge of doing so). A "++Formatted Content" in the serialization model would mean it may be present (but not compulsory) at that stage.

Is there actually something wrong in the specs (unlikely) or am I missing something (likely)? Can someone please enlight me?