Declaring character encodings in HTML, 27 Feb 2023

r12a commented 1 year ago

https://w3c.github.io/i18n-drafts/questions/qa-html-encoding-declarations.en.html https://w3c.github.io/i18n-drafts/questions/qa-html-encoding-declarations.fr.html

Translator: Gwendoline Clavé https://lists.w3.org/Archives/Public/w3c-translators/2023JanMar/0017.html

Source code: https://raw.githubusercontent.com/w3c/i18n-drafts/gh-pages/questions/qa-html-encoding-declarations.en.html

GH: —

clavoline commented 1 year ago

@r12a I hope you won't mind me asking questions here - otherwise, please let me know.

1) In the following paragraph:

If you have a UTF-8 byte-order mark (BOM) at the start of your file then recent browser versions other than Internet Explorer 10 or 11 will use that to determine that the encoding of your page is UTF-8.

Would you like to remove the mention of Internet Explorer 10 and 11 since they aren't so "recent" anymore?

2) In the following info note:

Although these are normally called charset names, in reality they refer to the encodings, not the character sets. For example, the Unicode character set or 'repertoire' can be encoded in three different encoding schemes.

which refers to the following paragraph:

Using UTF-8 not only simplifies authoring of pages, it avoids unexpected results on form submission and URL encodings, which use the document's character encoding by default. If you really can't avoid using a non-UTF-8 character encoding you will need to choose from a limited set of encoding names to ensure maximum interoperability and the longest possible term of readability for your content.

The term "encoding scheme" isn't defined in the document. Do you think its meaning is transparent for readers (including non-native English speakers)? Otherwise, would you like to replace it with a more transparent term, or rather define it somewhere? I'm asking because the relation between the two sentences in the info note isn't obvious to me.
According to the Unicode glossary, "encoding scheme" should be translated as "mécanisme de sérialisation". I'm afraid that this term can't be understood without an explanation. Defining each term in its respective version would solve this issue, but there might be a more transparent synonym. What do you think?

Thank you, Gwen

r12a commented 1 year ago

I hope you won't mind me asking questions here

This is a great place to raise the issues, thanks.

Would you like to remove the mention of Internet Explorer 10 and 11 since they aren't so "recent" anymore?

Good idea. We could perhaps replace "then recent browser versions other than Internet Explorer 10 or 11 will use that" with "then modern browsers will use that".

The term "encoding scheme" isn't defined in the document.

This term is used twice in the document, and i agree that its meaning is obscure.

I suggest that for the first usage (in the side note) we could replace 'encoding schemes' with just 'encodings', since the current wording is over-precise.

For the second use ("The byte-order mark at the beginning of your file will indicate whether the encoding scheme is little-endian or big-endian.") the usage is correct. We could link to the definition in the Unicode Standard by changing the source from 'encoding scheme' to <a class="termref" href="https://www.unicode.org/glossary/#character_encoding_scheme">encoding scheme</a>.

I noticed another issue:

The new Encoding specification now provides a list that has been tested against actual browser implementations. You can find the list in the table in the section called Encodings. It is best to use the names in the left column of that table.

The Encoding spec is no longer 'new', so we could drop that word.

The link to the Encodings section no longer points to the right place. I think we need to replace in the section called <a href="http://encoding.spec.whatwg.org/#encodings">Encodings</a> with in the section called <a href="http://encoding.spec.whatwg.org/#names-and-labels">Names and labels</a>.

I would also be inclined to change It is best to use the names in the left column of that table. to It is best to use the names in the left column of the table in that section.

Would you be willing to make those changes, too?

I'll skim through the article for other possible problems...

clavoline commented 1 year ago

Thank you for your replies @r12a :)

I'll take care of those changes in the French and English versions:

[x] Good idea. We could perhaps replace "then recent browser versions other than Internet Explorer 10 or 11 will use that" with "then modern browsers will use that".
[x] I suggest that for the first usage (in the side note) we could replace 'encoding schemes' with just 'encodings', since the current wording is over-precise.
[x] _For the second use ("The byte-order mark at the beginning of your file will indicate whether the encoding scheme is little-endian or big-endian.") the usage is correct. We could link to the definition in the Unicode Standard by changing the source from 'encoding scheme' to <a class="termref" href="https://www.unicode.org/glossary/#character_encoding_scheme">encoding scheme</a>._
[x] The Encoding spec is no longer 'new', so we could drop that word.

Re: "encoding scheme", can I link to http://hapax.qc.ca/glossaire.htm#mecanisme_de_serialisation_de_caracteres in the French version?

[x] The link to the Encodings section no longer points to the right place. I think we need to replace in the section called <a href="http://encoding.spec.whatwg.org/#encodings">Encodings</a> with in the section called <a href="http://encoding.spec.whatwg.org/#names-and-labels">Names and labels</a>.

Thank you. I actually updated those links in the French and English versions of the other article I submitted yesterday, but forgot to do the same in this one. I'll fix it.

I would also be inclined to change It is best to use the names in the left column of that table. to It is best to use the names in the left column of the table in that section.

I don't think this change is necessary since you already specified "the list in the table in the section" in the previous sentence. Do you want me to make it anyway?

I'll skim through the article for other possible problems...

Thank you. I'll submit some changes right away.

r12a commented 1 year ago

can I link to http://hapax.qc.ca/glossaire.htm#mecanisme_de_serialisation_de_caracteres in the French version?

Yes, i think so.

I don't think this change is necessary

You're right.

I didn't spot any other necessary changes.

r12a commented 1 year ago

Oh, there is one other possible change:

The IANA registry commonly includes multiple names for the same encoding. In this case you should use the name designated as 'preferred'.

It may be better to cast this further into the past with the following rewording:

The IANA registry has multiple names for the same encoding, in which case you are supposed to use the name designated as 'preferred'.

And the following paragraph, change:

The new Encoding specification now provides a list that has been tested against actual browser implementations.

to

Nowadays, you should use the Encoding specification, which provides a list that has been tested against actual browser implementations.

clavoline commented 1 year ago

If the Encoding specification is now recommended, is there a reason to mention the IANA registry at all?

If so, it could be useful to explain when the information available in the Encoding specification is enough, and when one should also look in the IANA registry.

r12a commented 1 year ago

Sorry for the delay @clavoline. I asked myself the same question, but in the end decided that the following revised wording would probably be sufficient. Some people will know about the IANA registry (and more often than not, don't know about the Encoding spec), so i think it's useful to refer to it.

Until recently the IANA registry was the place to find names for encodings. The IANA registry commonly includes multiple names for the same encoding. In this case you should use the name designated as 'preferred'.

The new Encoding specification now provides a list that has been tested against actual browser implementations. You can find the list in the table in the section called Encodings. It is best to use the names in the left column of that table.

clavoline commented 1 year ago

@r12a Thank you, I understand. In that case, I've made all the changes we discussed!

w3c / i18n-translations

Declaring character encodings in HTML, 27 Feb 2023 #55