Open himorin opened 1 year ago
Note that a number of the Media Types you mention are already constrained to use UTF-8 and do not require (in some cases allow) a charset parameter.
Is your comment:
All non-binary formats shall have constraint of charset as UTF-8.
... meant to be a suggestion to add to the quoted paragraph?
Actually, a table has Constraint
column, and some of which have charset=UTF-8
specifically.
I understand that html spec (WHATWG) limits to utf-8, and RFC 8259 states no charset is registered for json mime type, but reading 6.6.1, I'm really not sure whether current writing / description is appropriate or friendly to reader of specification (e.g. just have a line as 'UTF-8 is mandatory for all payloads')...
@aphillips thank you for your (and WG's) comments during call. I'm still wondering how to write the last line (actually), but how about edited text?
@himorin Thanks for working on this.
For the table I would change this:
Relation-Type | Constraint | Remarks |
---|---|---|
service-doc |
human readable documentation, supported formats are Unicode Text, markdown, HTML and PDF. |
to use the remarks more clearly:
Relation-Type | Constraint | Remarks |
---|---|---|
service-doc |
supported media types are: text/plain , text/html , text/markdown and text/pdf |
Human readable documentation |
And I would go on to add a paragraph under the table:
The types
text/plain
,text/html
, andtext/markdown
MUST include acharset
parameter (for example,text/plain;charset=utf-8
) and the linked files MUST use the UTF-8 character encoding. The typetext/pdf
uses Unicode in its encoding.
Note well: RFC2854 defines text/html
and is not obsolete. When the charset
parameter is missing, the default encoding is Latin-1 (and specifically iso-8859-1
). In practice browsers treat Latin-1 as windows-1252
and HTML5 sniffs the encoding in various ways (weighted towards trying to find UTF-8). However, it is still a good idea to use charset=UTF-8
.
Annoyingly, the definition for type text/markdown
in RFC7763 is actually unhelpful, but it requires a charset parameter and does not make UTF-8 (or any other encoding) the default because (and I quote):
[...] its syntax rules operate on characters (specifically, on punctuation) rather than code points. Many Markdown processors will get along just fine by operating on characters in the US-ASCII repertoire (specifically punctuation), blissfully oblivious to other characters or codes.
Therefore, in 6.6.2 I would include the charset=UTF-8
on all three of the first rows. I would then add a similar paragraph to the one in 6.6.1 saying approximately:
The types
text/plain
,text/html
, andtext/markdown
MUST include acharset
parameter (for example,text/plain;charset=utf-8
) and the linked files MUST use the UTF-8 character encoding. The typesapplication/json
, andapplication/ld+json
are already restricted to UTF-8. The typetext/pdf
uses Unicode in its encoding. Binary types, such asimage/jpeg
orapplication/octet-stream
, do not have a character encoding associated with them or define the encoding internally.
@aphillips Thank you for deep consideration. I've thought of that style of table a bit, but haven't went to that direction since that overlaps with next table... If we are to propose adding media types into a table of link relation, I'd rather propose to merge two, something like:
Relation-Type | Supported Media Types | Constraints | Remarks |
---|---|---|---|
icon |
image/png , image/jpeg |
||
service-doc |
text/plain , text/html , text/markdown , text/pdf |
Linked files MUST use the UTF-8 character encoding. | Human readable documentation |
Keeping two separated tables, both of which contain similar information (mime types), could be confusing for readers, and also difficult to compile information. With the last paragraph in @aphillips comment, attached below the integrated table, seems to be easier to tell all at one time.
ahhh, in addition to utf-8 as mandatory, do we need to change optional for hreflang
into required for text/plain and text/markdown with service-doc
and blank for anything else?
@aphillips how about this??
Section 6. Links is not clear and unorganized on several points:
service-doc
link relation type is written ashuman readable documentation, supported formats are Unicode Text, markdown, HTML and PDF.
but wording Unicode
is not clear. Considering restrictions placed at mime types, it should be clearly stated with UTF-8 is mandatory over all applicable types.
text/plain
, text/markdown
, and possibly on text/html
. We would propose to rewrite this section into one table for clarification and ease for noticing all of constraints with reorganizing all of attached text for description totally, something like:
Relation-Type | Supported Media Types | Constraints | Remarks |
---|---|---|---|
icon |
image/png , image/jpeg |
||
service-doc |
text/plain , text/html , text/markdown , text/pdf |
Linked files MUST use the UTF-8 character encoding. hreflang is mandatory for text/plain and text/markdown |
Human readable documentation. |
hi @aphillips , could you kindly take a time to have a look on this??
This is a tracker issue. Only discuss things here if they are i18n WG internal meta-discussions about the issue. Contribute to the actual discussion at the following link:
§ https://github.com/w3c/wot-profile/issues/386