w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 661 forks source link

[css-counter-styles-3] Support automatically localized counters #7959

Open pols12 opened 2 years ago

pols12 commented 2 years ago

Similarly to inline-start keyword which may mean “right” or “left” depending on the language, I think we should support

With these generic keywords, the user agent should use the counter symbols which are the most appropriate given content language and user preferences.

Affected: https://www.w3.org/TR/css-counter-styles-3/#predefined-counters

That would also extend and simplify the following css-list-3 recommendation:

UAs and host languages should ensure that the list-item counter values by default reflect the underlying numeric value dictated by host language semantics when setting up list item styling in their UA style sheet and presentational hint style mappings.

tabatkins commented 2 years ago

That bit in Lists is about the value, not how the value is represented in text. It just means that host languages with some notion of counters, like HTML's ol, should handle this by using the list-item counter. How that counter value is displayed is an entirely separate issue.

I'm not opposed to a counter-style that varies its display depending on the language of the element, but we'd need to do some work mapping the styles (and choosing a default for a given language, when there are multiple that could apply). Note that directional mapping is already well-defined by Unicode data, so we didn't have to do any extra work there.

Importantly, we also need a concrete use-case - is there a page that this would help you on, where you're currently working around it manually by varying the counter style?

pols12 commented 2 years ago

Note that directional mapping is already well-defined by Unicode data, so we didn't have to do any extra work there.

Do you have a link to that map, please? That would help me to implement workaround for this issue.

Importantly, we also need a concrete use-case - is there a page that this would help you on, where you're currently working around it manually by varying the counter style?

I work on page internationalization on Wikimedia Meta-Wiki. For a given source page with a custom stylesheet (“templateStyles”), all translation pages use the same stylesheet. Since inline-start is not well supported by browser, MediaWiki uses CSSJanus to flip left into right. However we currently don’t have any “magical” workaround for list-style-type property. On that page (partially) translated in Hindi, I would expect upper-alpha markers to be readable by Hindi speakers, that is, even being decimals instead of letters (probably devanagari).

pols12 commented 2 years ago

A more generic use case, in MediaWiki core: https://phabricator.wikimedia.org/source/mediawiki/browse/master/resources/src/mediawiki.skinning/i18n-ordered-lists.less

tabatkins commented 2 years ago

Do you have a link to that map, please? That would help me to implement workaround for this issue.

It's based on the unicode bidi algorithm, not a direct script mapping. Individual characters can have a directionality, and you can then obtain a direction for text based on analyzing the characters used. I think all the details are in Unicode TR9.

I work on page internationalization on Wikimedia Meta-Wiki. For a given source page with a custom stylesheet (“templateStyles”), all translation pages use the same stylesheet.

Oh, interesting! All right, sounds reasonable to me at least. ^_^

tabatkins commented 1 year ago

Agenda+ to discuss the possibility of adding a "localized numeric" value, that maps to one of the predefined numeric types based on the element language.

(How should I ping the i18n WG for feedback on this? @r12a ?)

dbaron commented 1 year ago

lower-alpha as an alias of lower-latin on English pages, hiragana on Japanese pages…

This specific suggestion doesn't work because lower-alpha is already a value that means the same thing as lower-latin.

pols12 commented 1 year ago

lower-alpha is already a value that means the same thing as lower-latin

This seems me a bit latin-script centered. We could specify that alias as deprecated for a while, but we may indeed want another (temporary or not) name to avoid confusion in documents which mention some items with their respective alpha symbol.

tabatkins commented 1 year ago

Yeah, a lot of the web is latin-centric, unfortunately.

That alias has been around for literally 20 years, tho. It's not getting deprecated.

(That said, using "alpha" to refer to hiragana is incorrect too; it's not an alphabet, it's a syllabary. Having "alpha" refer specifically to the latin alphabet does exclude other alphabetic languages, but at least it's by a huge amount the most commonly used alphabet in the world.)

r12a commented 1 year ago

I just read this thread, and here are some thoughts off the top of my head...

It may indeed be useful to have some generic keywords, but i expect that the mapping from counter-styles to keywords is something the author should be able to do – rather than expecting a registry or depending on the browser implementers to create the mappings.

That would mean creating a syntax that allows authors to map keywords to particular styles, themselves. This would also give authors the ability to specify their own custom styles, which i think will be a major advantage - not only for allowing alternate styling (such as for affixes), but allowing them to use completely new styles (there are certainly more than we have documented so far - in fact i'm about to add a bunch to the ready-made cs doc).

Presumably the styles assigned to a keyword would need to be associated with BCP47 language tags to apply the right style to the content.

That syntax could also allow authors to define their own keywords, rather than requiring them to squeeze their view of the world into some standardised set, which is likely to always be not quite what's needed, or biased towards Latin, or this or that.

It may be helpful, however, to suggest some keywords, such as numeric, alphabetic, additive – these being types of enumeration. We may, though, need to allow a combination of keywords to define a set, such as "alphabetic uppercase" - which would produce the intersection of those two definitions – though that may be going a bit too far.

pols12 commented 1 year ago

@r12a You seem to describe actual @counter-style specifications. And I’m not sure to understand how it is related to this issue.

andjc commented 1 year ago

A few thoughts:

If I am working with Hindi data and I require Devanagari digits when programming I have to use the language identifier hi-IN-u-nu-deva when working with the data.

Personally, I would refer the developer specifying the counter (as is currently the practice) or having browsers switch counter systems if the language tag specifies a number system to use.

r12a commented 1 year ago

@r12a You seem to describe actual @counter-style specifications. And I’m not sure to understand how it is related to this issue.

I may be mistaken, but i think you are asking for the ability to use a generic keyword which will trigger the application of a particular type of counter style, dependent on the context (usually language). For example, list-item-type:digits might apply the arabic-indic counter style for text in Arabic, the bengali style if the text is labelled as Bengali, the myanmar style for Burmese text, and so on. (And presumably default to the decimal style if there is no defined digit-based counter style for the language of the text.)

So then i was thinking around how we would define those generic keywords, and how we would map actual counter style definitions to them.

So we might end up with a declaration something like (off the top of my head):

generic-counter-style: "digits" { 
    'ar':'arabic-indic', 
    'bn':'bengali', 
    'my':'myanmar', 
    'az-arab':'arabic-indic', 
    'suz':'my-own-sunuwar-style', 
    .... }

Is that what you were thinking of?

pols12 commented 1 year ago

Is that what you were thinking of?

Yes. Thank you, for your detailed explanations! I indeed want to use the same style sheet for the same text translated in several languages.

i expect that the mapping from counter-styles to keywords is something the author should be able to do – rather than expecting a registry or depending on the browser implementers to create the mappings.

I understand through your various comments, including andjc’s one, that mapping would be hard to define precisely. However, that would much help if i18n standard experts like you provide a basic mapping usable by browsers, rather than leaving each web developer trying to create their own one gathering community snippets on the web. OK, not many websites expect to support multilingual texts as Wikimedia do. But still useful, though, in my opinion.

css-meeting-bot commented 1 year ago

The CSS Working Group just discussed [css-counter-styles-3] Support automatically localized counters.

The full IRC log of that discussion <emilio> TabAtkins: r12a do you think that this is more of a topic to be discussed or something to be introduced?
<emilio> r12a: we could discuss the concept, my take was similar to the ready-made counter-styles
<emilio> ... I'd think it'd be something that authors would define rather than something built-in
<emilio> ... would you like to introduce this?
<r12a> https://github.com/w3c/csswg-drafts/issues/7959#issuecomment-1592610379
<emilio> TabAtkins: somebody filed it requesting that for a few different categories we have an automatically-internationalized version
<emilio> ... e.g., `digits` would be `decimal` for english, french, ... but map to something else for other languages
<emilio> ... same for letters which could map to hiragana in japanese
<emilio> ... before we accepted moving counter styles into the registry this was a lot harder to do
<emilio> ... r12a proposed a new at-rule mapping lang to digits
<florian> q+
<emilio> ... does this sound worth pursuing
<miriam> ack r12a
<Zakim> r12a, you wanted to react to r12a
<emilio> r12a: the registry doesn't really need to enter into it in the registry
<emilio> s/to it in the registry//
<emilio> ... needs to work with author styles
<emilio> ... just a clarification
<emilio> jensimmons: I really like this idea
<TabAtkins> My thought was just that, since we'd be supporting a ton of these, we'd also have a UA stylesheet registry with a bunch assigned. Authors would still be able to extend/override it.
<emilio> ... so many languages have translations
<TabAtkins> q+
<emilio> ... so even if they are not published in different languages
<emilio> ... I'd love for it to be in some ua default
<TabAtkins> (i'm just gonna say that this was noted as actaully being helpful for Wikipedia; they currently manually set a bunch of counter styles for the different translations)
<emilio> ... but if it can't be it'd probably end up in some kind of reset/framework sheet
<fantasai> https://github.com/w3c/csswg-drafts/issues/7959#issuecomment-1592298287
<emilio> fantasai: there's some complications here
<miriam> ack fantasai
<emilio> ... see linked comment
<emilio> ... (a) you don't always want to translate the counter style, e.g. western digits might be used in other languages
<emilio> ... (b) some languages have multiple styles which might be designer-preference or so
<emilio> ... so I'm skeptic of adding something built-in
<emilio> ... it'd be relatively straight-forward to do this using `:lang()`
<emilio> ... what we don't have is a way to set a default counter-style for the counters function
<emilio> ... if we have that e.g. via a `counter-style` property then it'd be really easy to make this mapping
<miriam> ack florian
<TabAtkins> "relatively straightforward" is still hundreds of selectors, fwiw
<fantasai> s/mapping/mapping in the stylesheet/
<emilio> florian: I'm nervous, I' really like the vision of the international way, but this makes me nervous because it heightens the worry of the previous issue
<TabAtkins> and there are two types at least - numeric and writing - so a default counter() style won't satisfy it
<emilio> ... if you copy and paste it you check they're right, if you turn them on you likely at least checked
<emilio> ... it increases the chances of a wrong counter style appearing in the page if you need to do neither
<emilio> ... other concern is getting the mapping wrong
<emilio> ... e.g., japanese might not want to automatically switch to hiragana, I don't think it should be default
<r12a> q+
<emilio> ... if we were doing a couple or ten languages then we can probably figure it out
<fantasai> s/default/default, that would be jarring/
<emilio> ... but if we want hundreds, how often would we get it wrong in a way that's worse
<miriam> ack TabAtkins
<emilio> ?
<emilio> TabAtkins: if this is just a matter of UA rule then it is fixable
<florian> q+
<emilio> ... as we discover that they're wrong we can just get them fixed
<emilio> ... as the way it'd be designed you could override any of the mappings
<miriam> ack r12a
<emilio> r12a: what florian and fantasai were worried about is if this is baked into the browser
<TabAtkins> i'm happy to separate the "define the at-rule" part from the "and then put a default version of it into the UA stylesheet"
<emilio> ... proposal so far is to make it an author controlled rule for now
<emilio> ... I think that should be less problematic
<miriam> ack florian
<emilio> florian: what I'm about to say depends on whether this is auto-turned-on or not
<emilio> ... do authors need to opt in?
<emilio> TabAtkins: proposal is to define some new at-rule to define mapping from "generic family" to concrete style
<emilio> ... then there's the question of "should we have a default in the UA sheet"
<emilio> florian: the later makes me very nervous
<fantasai> +1 florian
<emilio> ... e.g., english is a roman-alphabet language and someone could consider using roman numerals
<fantasai> s/later/latter/
<emilio> ... sure we could fix it
<jensimmons> q?
<emilio> ... and in english we'd notice fairly easily
<emilio> ... but maybe not for smaller languages
<emilio> TabAtkins: I don't think we'd auto-apply it to ol
<fantasai> s/and someone could consider using roman numerals/, so suppose someone thought that it would be appropriate to use roman numerals, and then shipped it out across the Web. That would be very disruptive/
<emilio> ... but if auto-applying the mapping is controversial let's discuss that separately
<emilio> fantasai: I think we need a more specific proposal
<emilio> TabAtkins: assuming there's no objections I think we should pursue this idea
<emilio> [more discussion about auto-applying vs not]
<fantasai> s/sure we could fix it/sure, we could fix it--and for English it would happen quickly--but for a less common language, it could take a long time to bubble up/
<emilio> jensimmons: you wouldn't be declaring the whole counter styles, but you'd define the mapping to language and you'd have to use it in your list-style
<emilio> florian: as long as we don't auto apply <ol> in numeric types
<emilio> ... I'm fine
<TabAtkins> `@generic-counter-style digits { en-US: decimal; ...}`rather than `ol:lang(en-US) { list-style-type: decimal; } ...`
<emilio> jensimmons: instead authors would need to opt-in into this generic-digits
<miriam> ack fantasai
<emilio> fantasai: two thoughts. How much of this could be done using our existing mechanism using lang selectors with a `counter-style` property
<emilio> ... other question is, even with a new opt-in keyword for this and deployed that what I'd expect to see is that a lot of people that are authoring pages would use it and then get wrongly translated things in other languages
<emilio> TabAtkins: let's push that topic out
<emilio> ... not discussing applying anything automatically
<emilio> fantasai: not talking about that, just about any thing in the UA sheet
<emilio> ... if only for authors, do we need a new mechanism, or can we use :lang()
<emilio> TabAtkins: let's stop discussing the second point. Can authors do this today? yes
<emilio> ... see example above
<emilio> ... this would be essentially just sugar over that
<emilio> ... possibly a bit more efficient for browsers (less selectors?)
<emilio> ... but yeah it'd essentially be just sugar over selectors
<emilio> r12a: I don't think that's quite right
<emilio> ... in the labeled-digits example you can just use in the stylesheet to use digits
<emilio> ... so I think it's a little extra
<emilio> TabAtkins: no you can always just apply the language selectors you want yourself
<emilio> ... nothing fundamentally new or magical about it
<r12a> generic-counter-style: "digits" {
<r12a> 'ar':'arabic-indic',
<r12a> 'bn':'bengali',
<r12a> 'my':'myanmar',
<r12a> 'az-arab':'arabic-indic',
<r12a> 'suz':'my-own-sunuwar-style',
<r12a> .... }
<emilio> miriam: seems like back to the issue for the full proposal
tabatkins commented 1 year ago

Proposed grammar:

/* generically */
@generic-counter-style <generic-family-name> [ <lang-tag> <counter-style-name> ]#;

/* specific examples */
@generic-counter-style numbers "en" decimal, "zh" cjk-decimal;
@generic-counter-style letters /* same stuff */;

The generic names are predefined, like generic font families. I think "numbers" is reasonable as a name but we'd probably want to come up with a better name for "letters".

These generic names would be added to the list of non-overrideable names. Multiple occurrences of the rule for a given generic family cascade together; you can override a given language's associated style. Language tags already have a notion of hierarchy, so we'd apply that here a la shorthands - "en" trinary would override a preceding "en", but also all preceding "en-US", "en-GB", etc pairs.

If there's no matching language tag we fall back to a basic maximally-compatible one: numbers defaults to decimal, letters defaults to lower-alpha.

r12a commented 1 year ago

I did at one point wonder whether the generic family names should be the same as our counter style types: ie. numeric, additive, alphabet, fixed... But then i thought that people may actually want to mix and match the types if one style is more common than another for certain languages.

The same might be true when using numbers and letters as the keywords. So then i started thinking that perhaps that keyword should just be user defined.

So the generic family name could be numbers if you wanted and expected only to use numeric styles, but it could be wikipedia-styles if you liked, so that you could use additive styles for some languages, alphabetic for others, and numeric for the rest.

tabatkins commented 1 year ago

Only issue with making the namespace user-defined is we need to define what happens when a counter style is defined both by @counter-styles and @generic-counter-style. Presumably generic would win, but that's not clear.