w3c / publ-cg

EPUB 3 Community Group Repository
Other
44 stars 16 forks source link

characters/encodings best practices #42

Open RachelComerford opened 6 years ago

RachelComerford commented 6 years ago

From BISG survey

johnlourdusamy commented 6 years ago

Character encoding in HTML

Below are the some best practices of character encoding in HTML files:

  1. Always declare the encoding of your document using a meta element with a @charset attribute.
  2. The declaration should fit completely within the first 1024 bytes at the start of the file, so it's best to put it immediately after the opening head tag.
  3. You should always use the UTF-8 character encoding.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
<head>
<meta charset="utf-8" />
...
  1. It doesn't matter whether you type UTF-8 or utf-8.
  2. Avoid using the @http-equiv="content-type" and @content="text/html" attributes (called a pragma directive) those are not recommended, because "@content-type" attribute value is obsolete as per HTML5 living standard.
  3. It is recommended to avoid using an XML declaration in XHTML5 documents. For example, <?xml version="1.0" encoding="utf-8"?>
  4. Avoid UTF-8 BOM encoding, as this is known to cause some ugly display issues with some user agents, and can even crash php includes.
  5. If possible, choose an editor or set up that will not output a BOM in UTF-8 files.
    1. For example, Notepad on Windows will always add a BOM when you save a file with the UTF-8 encoding.
    2. You can find out whether a document contains a BOM at the start or further down in the content by using the W3C Internationalization Checker https://validator.w3.org/i18n-checker/ screenshot_bom
    3. If you need to remove the BOM, you can use editors such as Notepad++ on Windows and TextWrangler on the Mac, it is possible to select the encoding from a list while using the Save As function. The list has options to save as UTF-8 with or without the BOM. Just choose the option without the BOM and save.
  6. HTML5 deprecated the use of the charset attribute on an a, link and script elements, so you should avoid using it. For example: See our <a href="/mysite/mydoc.html" charset="iso-8859-15">list of publications</a>.

Source links: https://www.w3.org/blog/2008/03/html-charset/ https://www.w3.org/International/questions/qa-html-encoding-declarations https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta