Unicode terminology and representation updates.

gkellogg commented 10 months ago

gkellogg commented 9 months ago

The rebased branch addresses the various points raised by @afs and @TallTed. General, we avoid quotes around characters and tokens now. The BNF is also updated to use a consistent quote style.

gkellogg commented 9 months ago

I accepted the last suggestions for spelling out code points for each character used as a token. But, I feel that this is really messing up the flow of the document, and repeating these code point values excessively. I think a future editorial round should consider describing the character tokens explicitly in a "How to Read this Document" section and the character use be made links back into this list. For example:

# How to Read this Document

The following characters and tokens are used throughout this document and have specific Unicode code points:

<dl>
<dt><code id="cp-colon">:</code></dt>
<dd>"colon", code point <code class="codepoint">U+003A</code>)</dd>
</dl>

Then subsequent uses of that character token could be like the following:

A <span id="prefixed-name"><dfn>prefixed name</dfn></span> is a prefix label and a local part,  separated by a <a href="#cp-colon"><code>:</code></a>.

This would be a pattern to add to the Editor's Guide to follow in other specifications.

TallTed commented 9 months ago

Some of the bits that seem redundant in my latest suggestions, including within the same sentence, could be rephrased. For instance --

 <p>While the <code>@prefix</code> and <code>@base</code> directives
        require a trailing <code>.</code> (full stop, <code class="codepoint">U+002E</code>) after the IRI,
        the equivalent <code>PREFIX</code> and <code>BASE</code>
        must not have a trailing <code>.</code> (full stop, <code class="codepoint">U+002E</code>) after the IRI part of the directive.
        The <code>PREFIX</code> and <code>BASE</code> are case-insensitive
        and can be written as <code>prefix</code> and <code>base</code>
        or use mixed case.

--- might work as --

 <p>While the <code>@prefix</code> and <code>@base</code> directives
        <em class="rfc2119">MUST</em>,
        the equivalent <code>PREFIX</code> and <code>BASE</code>
        <em class="rfc2119">MUST NOT</em>, have a trailing <code>.</code> (full stop,
        <code class="codepoint">U+002E</code>) after the IRI part of the directive.
        The <code>PREFIX</code> and <code>BASE</code> are case-insensitive
        and can be written as <code>prefix</code> and <code>base</code>
        or use mixed case.

I could probably live with the "How to Read this Document" section.

My concern remains that MANY of the visually "clear" and "obvious" punctuation characters are, in fact, visually ambiguous, especially but not only to readers who are more accustomed to non-Latin character sets, no matter that Latin character sets are typical for W3C and or SDO documents.

(The EBNF presentations will remain quite problematic, but I don't have a good way to fix those, short of using names and/or code points for all such characters as I suggested previously, which I'll grant tend to make the EBNF much harder to read, which is the only reason I haven't fought for them.)

gkellogg commented 9 months ago

For instance --

 <p>While the <code>@prefix</code> and <code>@base</code> directives
        require a trailing <code>.</code> (full stop, <code class="codepoint">U+002E</code>) after the IRI,
        the equivalent <code>PREFIX</code> and <code>BASE</code>
        must not have a trailing <code>.</code> (full stop, <code class="codepoint">U+002E</code>) after the IRI part of the directive.
        The <code>PREFIX</code> and <code>BASE</code> are case-insensitive
        and can be written as <code>prefix</code> and <code>base</code>
        or use mixed case.

--- might work as --

 <p>While the <code>@prefix</code> and <code>@base</code> directives
        <em class="rfc2119">MUST</em>,
        the equivalent <code>PREFIX</code> and <code>BASE</code>
        <em class="rfc2119">MUST NOT</em>, have a trailing <code>.</code> (full stop,
        <code class="codepoint">U+002E</code>) after the IRI part of the directive.
        The <code>PREFIX</code> and <code>BASE</code> are case-insensitive
        and can be written as <code>prefix</code> and <code>base</code>
        or use mixed case.

Leaving this for now; this is a Note, so normative keywords MUST and MUST NOT are not in force. Thus, the use of "must not" instead of "MUST NOT". I'll change to "do not", which I think is better informative language.

TallTed commented 9 months ago

[@gkellogg] Leaving this for now; this is a Note, so normative keywords MUST and MUST NOT are not in force. Thus, the use of "must not" instead of "MUST NOT". I'll change to "do not", which I think is better informative language.

I wish I'd seen this before you merged it. I think the same strength of language should be used on both PREFIX and @prefix, so I suggest --

<p>While the <code>@prefix</code> and <code>@base</code> directives
        require a trailing <code>.</code> (full stop, 
        <code class="codepoint">U+002E</code>) after the IRI,
        the equivalent <code>PREFIX</code> and <code>BASE</code> directives
        require that there be no trailing <code>.</code> (full stop, 
        <code class="codepoint">U+002E</code>) after the IRI.
        The <code>PREFIX</code> and <code>BASE</code> are case-insensitive
        and can be written as <code>prefix</code> and <code>base</code>
        or use mixed case.

-- which I'll make into a PR if needed.

w3c / rdf-turtle

Unicode terminology and representation updates. #39