whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.01k stars 2.62k forks source link

More semantic way to represent the computer language in `code` element #7869

Open IgorGilyazov opened 2 years ago

IgorGilyazov commented 2 years ago

https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-code-element

There is no formal way to indicate the language of computer code being marked up. Authors who wish to mark code elements with the language used, e.g. so that syntax highlighting scripts can use the right rules, can use the class attribute, e.g. by adding a class prefixed with "language-" to the element.

The similar problem is resolved in meter element via the title attribute:

There is no explicit way to specify units in the meter element, but the units may be specified in the title attribute in free-form text.

For consistency can do the same with the code element:

<pre><code title="pascal">var i: Integer;
begin
   i := 1;
end.</code></pre>
jimmywarting commented 2 years ago

Don't think that the title should be used to define the code language, the title could describe what the code dose. maybe a language attribute would be better?

<pre><code title="Script to ask user for a display name" language="javascript">
prompt('Choose a username')
</code></pre>

I have also used highlight plugins I also think a classname such as language-js is terrible, a better solution would have been data-language="js" where it could have been picked up with elm.dataset.language or something like it instead

IgorGilyazov commented 2 years ago

<...> the title could describe what the code does.

Personally I completely agree with that. However, the title attribute can represent any sort of additional information, that's why the spec states it can be used to specify units for the meter element. Thus, for consistency I suggest to use it to specify language for the code element.

<...> classname such as language-js is terrible

While class name is encouraged to be informative, it is not the right tool to add additional semantics. The more appropriate tool for such task is microdata. For example, let's consider programmingLanguage property of SoftwareSourceCode type:

<div itemscope itemtype="https://schema.org/SoftwareSourceCode">
  <meta itemprop="programmingLanguage" content="javascript" />
  <code>console.log("Hello, World!");</code>
</div>

It works, but the code is too bloated for such a simple task. A distinct attribute like language would be ideal, but do we really need to add a unique attribute ?

<...> a better solution would have been data-language="js"

While being a viable solution, user-defined attributes are too generic and can be used to represent anything:

<code data-language="lisp" data-author="John Doe" data-platform="gnu clisp 2.49" data-version="4.2">
(format t "Hello, World!")
</code>
gjvnq commented 2 years ago

I suggest adding a codelang attribute for this.

Example:

<pre><code codelang="pascal">var i: Integer;
begin
   i := 1;
end.</code></pre>
Sudrien commented 2 weeks ago

<code type="text/x-pascal”>...</code>

Valid MIME type string is defined, type attribute is already defined in other places.