Open bahrus opened 1 year ago
Not an endorsement for this proposal, but if a unit/measurement tag were to be added I'd push for using the ECMAScript list of units, and avoiding ambiguous terms like $
for a unit type.
An alternative list of units, used by Intl.NumberFormat, found here
I’d suggest that if this were to exist, <number>
(or something similar) might be better aligned with what’s being proposed than <measurement>
(and would match NumberFormat). In the sentence “does the melon weigh less than 12lbs?”, there’s a numeric value with a unit dimension that could benefit from this element, but it’s seemingly not a measurement (at least as I’d generally understand the word? nothing was measured). A broader usage of the word may be legit, but I suspect it could cause some confusion about when it’s semantically appropriate to use.
Modified title to reflect feedback.
I would like to add my support for this proposal and add that semantic markup for numerals is also an accessibility issue.
The reason I came here was that I encountered issues with properly marking up Roman numerals to make sure screen-readers can handle them. Unfortunately, there is no good solution at the moment.
For example, a screen reader would have no way of knowing how to pronounce text like Part I
without additional markup (is that "one" or "aye"?). And even users who actually did learn Roman numbers in school will need some mental effort to figure out that MCMLXXXIV
is actually 1984.
(Side note: Unicode Roman numerals are not a solution either. In fact, even the Unicode specs discourages their use outside of specific use-cases)
This problem gets only bigger once we talk about other number systems. For example, who knows that the Greek letters χξϛʹ
can also represent a number (666 in this case). Or look at Japanese numbers: 拾弐
can stand for 12, but apparently can also be read as words...
For more information, see the Wikipedia page List of numeral systems.
And of course, there are number representations for numbers that are in other bases than our usual base-10 system. ABCD
might well be the hexadecimal representation of the decimal number 43981, or just the beginning of the alphabet – without context, that is hard to say.
Another potential pitfall are the number formats for Arabic numbers that vary in different countries, and are also often mixed up in the localization process. If I read, e.g. 1,234
on an English page on a German website, it can be difficult to know if that refers to 1234 or 1.234 in an English format, because it would be the latter if the German rules are applied, and the first if the English rules are in use.
Last but not least, the pronunciation of numerals is more complex than one might think. Look again at 1984
: depending on the context, this may either be pronounced as "nineteen-eighty-four" (if used as a year number), or "one-thousand nine-hundred and eighty-four" (in most other situations).
My preferred solution here would be a short, easy to remember tag (I'd opt for <num>
, but any of the other suggestions here is fine, really), with at least a value
attribute that can hold a plain and defined number format. Also, the unit
attribute might be a good addition. Other attributes are standard in any case, like title
or lang
.
However, most importantly, I would suggest that a pronunciation guide (e.g. in aria-label
) should be supported.
If we look at existing tags, we already have the (slightly misnamed) <time>
tag, which allows us to mark this up in a more semantic way:
<time datetime="1984" title="1984">MCMLXXXIV</num>
(This is unfortunately missing the possibility to add a pronunciation guide, such as aria-label="nineteen eighty-four"
, but that's a different issue...)
Similarly, it would help a lot if we could mark up other numerals: For example, a Roman numeral that is not a time reference could be marked up as follows:
<num value="9" title="9">IX</num>
Similarly, the Ancient Greek number mentioned above:
<num value="666" title="666" aria-label="six-six-six"><span lang="grc">χξϛʹ</span></num>
(note that the additional <span>
is necessary to avoid the screen reader switching to a Greek voice for the label text).
An important use case could also be to allow encoding a standardized, machine-readable numeral, thus avoiding ambiguities in the formatting:
<num value="1.234" lang="de">1,234</num>
<num value="1234" lang="es">1.234</num>
<num value="1234.56" lang="fa">1,234/56</num>
<num value="1234.56" lang="de-CH">1'234.56</num>
etc.
This also does away with the problem of having to interpret fractional parts or exponential figures:
<num value="3.5" unit="litre">3½ l</num>
<num value="4000000000000">4 × 10¹²</num>
And of course, the possibility of adding a standardized unit will enable user-side functions that take the value and convert it in a meaningful way.
Think, for example, of cooking recipes, where measurements are often given in regional or uncommon units (like "cups", "spoon" or even "pound") and they can be easily converted, on the client-side, to something the user is more familiar with, and/or scaled to the required size (e.g. convert a recipe for 2 persons to one for 6 people).
To conclude, I believe a specific tag for numeric values - similar to the one we already have for date and time representations, would be very useful.
More evidence of demand
This would be similar in purpose and functionality to the time(date) tag, but for numbers.
This would be a declarative HTML wrapper around Intl.NumberFormat.
What is displayed to the user would be number.localeString() if no other attributes listed below are specified. Otherwise Intl.NumberFormat could be used.
Suggested main attributes:
I would propose every additional property Intl.NumberFormat recognizes be supported as an attribute (particularly as they move out of experimental status): numberingsystem, signdisplay, unitdisplay, roundingmode, etc.
This proposal also calls for including this tag in the microdata specification, where the value of the tag is derived from the value attribute/property.