whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.04k stars 2.63k forks source link

Hinting at a translation language for outgoing links #2945

Open domenic opened 7 years ago

domenic commented 7 years ago

Currently, if a website is in one language, but wants to link to a site for important information that is only available in some other language, they have a few options:

  1. Link directly to the site in question, and hope the user is able to translate (either based on their knowledge of the language, or with the help of tools, or with the help of their browser, as is the case in Chrome and Edge).
  2. Link to a machine-translated version of the site (example)

Both of these are not great. In general, the translation produced by (2) is poor, because it acts on the server-rendered page before any JavaScript gets a chance to execute. This can break the JavaScript and UI, in fact. For (1), the issue is that the site may have more information about the preferred language than the browser does; Chrome has data that many users leave their device set to English, despite that not being their preferred language.

Currently we have teams within Google that are working around this using a Chrome- and Android-proprietary solution, by changing their links to specific Android intents (e.g. chrome-translate://to-hindi/https://en.wikipedia.org/wiki/Cat). Chrome can then interpret this intent and trigger the translate UI appropriate. But we thought it was a good idea to work on something that could benefit all browsers and clients, not just Chrome on Android.

As such, we'd like to propose a way of hinting to the client that it should translate an outgoing link into a specific language, which pages can use to reliably suggest the desired translation. In Chrome, we are interested in using this to trigger our built-in translation, but you could also imagine its uses for other clients (including e.g. browser extensions). A simple strawperson is <a href="..." translatehint="hi"> (using a standard language code). This version would then be feature-testable via "translateHint" in HTMLAnchorElement.prototype.

How does this sound? Chrome is certainly interested in implementing. We're hoping to get a sense whether others think adding a standardized way to do this is a good idea, or whether we should keep exploring proprietary solutions.

annevk commented 7 years ago

Seems reasonable, might be good to ping www-international@w3.org? (Bikeshed: hreftranslate given hreflang?)

domenic commented 7 years ago

Posted to www-international at https://lists.w3.org/Archives/Public/www-international/2017JulSep/0073.html .

@annevk, @foolip, @zcorpan, what criteria do you think we should use for spec inclusion here? I'm OK with using the usual one (two engines interested in implementing), but I think this feature might be a bit different. Here's my thinking:

Would welcome any thoughts here.

asmusf commented 7 years ago

I'd defy any website author to guess correctly what content I'd want to see in what language. And no, even my preference would depend on the nature/content of the page.

tabatkins commented 7 years ago

The website isn't guessing what language you want to see it in, it's hinting to the browser what language the page at the remote end of the href is. That way your browser, if it has the capability to do so, can offer to automatically translate it; this functionality is completely on the browser UI side of things, tho, and control over it is up to the original browser, including how to let the user customize this functionality.

domenic commented 7 years ago

@tabatkins no, the translate hint---as opposed to hreflang---is for what language to translate the destination content into. Imagine being on a Hindi website and searching for some term. A page for that term is available, but not in Hindi; instead it's in Japanese. Instead of showing "no results found" for that Hindi term, the Hindi search result page would use hreftranslatehint="" on a link to the Japanese result, to indicate to the browser that the user might be well served by being offered a translation from Japanese to Hindi, since their search term was in Hindi. Then, if they click on that link (or perhaps in some kind of right-click/long press UI for the link itself), the browser can help out.

asmusf commented 7 years ago

On 8/21/2017 1:45 PM, Tab Atkins Jr. wrote:

The website isn't guessing what language you want to see it in, it's hinting to the browser what language the page at the remote end of the href is. That way your browser, if it has the capability to do so, can offer to automatically translate it; this functionality is completely on the browser UI side of things, tho, and control over it is up to the original browser, including how to let the user customize this functionality.

If that's the intent, why not indicate that in the naming?

Then it would allow other actions, like popping up a flag image when you hover over the link.

Even so, many sites will allow you access to their contents (or similar contents) in other languages once you there, or worse, will recognize you are not accessing the site from a "standard" location and serve you a page to choose. I just see too many ways that this can get in the way rather than be a help.

tabatkins commented 7 years ago

Oh! In that case, I kinda agree that it sounds weird. My preferred languages should be known by the UA already; I don't see how a webpage has any particular way of knowing what I'd prefer.

(I see your use-case of a search page noticing what language the search term is in, but I highly suspect that this attribute would quickly become highly polluted by people just spamming the language that their page is written in, or making the same mistake I did and thinking it's akin to hreflang.)

asmusf commented 7 years ago

On 8/21/2017 2:08 PM, Domenic Denicola wrote:

what language to translate the destination content into

That makes no sense. How does the page author know what the best language would be to translate that into? That depends not only on the user but on the content of the page (and on the quality of the translation).

For truly monlingual users, there's ever only one language into which translation makes sense. For bi/multi-lingual users, it's unpredictable.

domenic commented 7 years ago

As I noted in the OP, we have a lot of evidence that users don't set their UA language to their preferred language. Or, as @asmusf states, users may prefer different languages in different contexts (e.g. browser UI versus cooking recipe web pages). Thus information like "the user searched for 花 and not flower" would be a way in which pages have useful information they could use to hint back to the browser.

asmusf commented 7 years ago

On 8/21/2017 2:16 PM, Tab Atkins Jr. wrote:

Oh! In that case, I kinda agree that it sounds weird. My preferred languages should be known by the UA already; I don't see how a webpage has any particular way of knowing what I'd prefer.

And the preference isn't static.

I read some languages where I normally don't use translations, but might want one if the terminology is arcane. In a given case, the choice of target language for the translation may well depend on the subject matter. Certain things don't translate well in a given language pairing, or I might have a better command of the subject vocabulary in one of the possible target languages. Or, I may deliberately want a particular translation (actually, that's an unusually common use case for me).

In none of these cases is the opinion of some website author of any use to me. Will only get in the way. Just like search engines guessing in which language I want search results in.

(I see your use-case of a search page noticing what language the search term is in, but I /highly/ suspect that this attribute would quickly become highly polluted by people just spamming the language that their page is written in, or making the same mistake I did and thinking it's akin to |hreflang|.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/whatwg/html/issues/2945#issuecomment-323855200, or mute the thread https://github.com/notifications/unsubscribe-auth/ANbTHmKvMW5bOTA50-w3Ej3gqfZ3SncMks5safOWgaJpZM4O8EWm.

asmusf commented 7 years ago

On 8/21/2017 2:26 PM, Domenic Denicola wrote:

As I noted in the OP, we have a lot of evidence that users don't set their UA language to their preferred language. Or, as @amsusf states, users may prefer different languages in different contexts (e.g. browser UI versus cooking recipe web pages). Thus information like "the user searched for 花 and not flower" would be a way in which pages have useful information they could use to hint back to the browser.

If you really want to take over for the user, just recognize the mix of languages in the content they are viewing and build a profile on the fly. Much more "big brother" and probably more reliable than asking authors to supply information out of context.

And your example is literally one of the ones where this would fail for me. I routinely search for terms where getting related information in another language (or from another cultural area) is what I am after. In the examples I wouldn't know how to search for "flower" because it's not a dictionary term, perhaps not even a 1:1 correspondence in expressions and getting a link to a page about "flower" is what I am actually after.

Once I get that page, I either read it as is, or view a translation into any number of languages - unpredictable from the content of the search or the content of the pages I traverse.

Predicting  target language really only works for true monolinguals. You should be able to use the UI language for them. Even bilingual users will want some stuff translated and others not; a very typical case is professional vocabulary in English. But at least it makes sense to pre-load a context menu for them, but that's still not dependent on the source page (it might be dependent on the target page).

This whole scenario seems silly to me.

highdn commented 7 years ago

It can be extremely useful to be able to trigger automatic page translation for a hyper link.

For example, let's say I'm a webmaster of a Hindi site about lung cancer.

I would like to direct my readers to authoritative sites like webmd.com or cancer.org for more and latest information.

Unfortunately vast majority of my users are: (1) not very fluent in English, so they can't consume those sites directly; (2) not very tech savvy to tell their browser to translate to Hindi for them;

It would be extremely beneficial to my readers if I as a webmaster could tell the browser to translate linked webmd.com and cancer.org pages to Hindi for my readers.

अधिक जानकारी <a href="http://www.webmd.com/lung-cancer/default.htm" translateHint="hi">फेफड़ों का कैंसर</a>

I hope the example above clarifies the use case for the translate hint attribute.

Thanks!

domenic commented 7 years ago

asmusf, thanks for making the point that this is probably not a feature targeted at you. We hear you loud and clear; I think your point has been made and is good feedback. I don't believe you need to belabor it further by replying to everyone on this thread explaining again that the feature won't work for you.

I imagine you will ignore this feature in your browsing, similar to (I assume) how you already ignore or have disabled any offers of translation from Chrome or Edge (if you use those). In the meantime, I hope you will allow those of us interested in this feature to discuss it, and how it can benefit those who aren't fluent in "any number of languages".

asmusf commented 7 years ago

On 8/21/2017 2:50 PM, Dmitriy Khramtsov wrote:

It can be extremely useful to be able to trigger automatic page translation for a hyper link.

For example, let's say I'm a webmaster of a Hindi site about lung cancer.

I would like to direct my readers to authoritative sites like webmd.com or cancer.org for more and latest information.

Unfortunately vast majority of my users are: (1) not very fluent in English, so they can't consume those sites directly; (2) not very tech savvy to tell their browser to translate to Hindi for them;

I find that a very patronizing attitude.

If they are true monolinguals then it should be up to the browser to make it easy to either 1) discover that 2) allow the user to declare that. After that, this would work, whether or not you decide to take such steps.

I can tell you that in my observation having someone else make that choice never works. FB offers automatic translations of posts. It never works for the languages that I truly don't understand and can't read. It often translates my native language into English (UI language) and it sometimes translates English into some other language.

Some of my friends (or family) post in more than one language. Sometimes I wonder why they suddenly sound slightly imbecile/deranged. Then it dawns on me that "autotranslation" has struck again.

It would be extremely beneficial to my readers if I as a webmaster could tell the browser to translate linked webmd.com and cancer.org pages to Hindi for my readers.

If the translations are generated on the fly, that would be a scary prospect - because with a topic like that, mistranslations might have real consequences. And I would not call such translations "authoritative" any longer.

अधिक जानकारी फेफड़ों का कैंसर http://www.webmd.com/lung-cancer/default.htm

I hope the example above clarifies the use case for the translate hint attribute.

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/whatwg/html/issues/2945#issuecomment-323862297, or mute the thread https://github.com/notifications/unsubscribe-auth/ANbTHsd_JSUywas1RNmR56AYJpI6c4xkks5safuhgaJpZM4O8EWm.

asmusf commented 7 years ago

On 8/21/2017 2:54 PM, Domenic Denicola wrote:

asmusf, thanks for making the point that this is probably not a feature targeted at you. We hear you loud and clear; I think your point has been made and is good feedbac. I dont believe you need to belabor it further by replying to everyone on this thread explaining again that the feature won't work for you.

I remain a) skeptical, b) unconvinced that it will be easy to sidestep "features" like this.

I imagine you will ignore this feature in your browsing, similar to (I assume) how you already ignore or have disabled any offers of translation from Chrome or Edge (if you use those). In the meantime, I hope you will allow those of us interested in this feature to discuss it.

I would be interested how many of you are personally heavy users of this "feature" and therefore have solid practical experience in how it would work.

sideshowbarker commented 7 years ago

I guess there’s not already anything in https://www.w3.org/TR/its/ or https://www.w3.org/TR/its20/ that would be appropriate for this case?

Regardless, would be good to get feedback @fsasaki on this

highdn commented 7 years ago

asmusf, it seems to me you never done user studies in India. If you allow me to simplify your point, it is "the browser and the user should do the right thing". Unfortunately it is harder than you may think and any help there is much needed. This proposal adds a webmaster to the picture. E.g. it is the joint effort of the user, the webmaster and the browser to do the right thing. Of course, this feature should be implemented in a non-confusing, non-intrusive and non-disruptive way. As domenic mentioned, you've been heard, and we thank you for your feedback. Thanks!

fsasaki commented 7 years ago

Thanks @sideshowbarker for the ping. In ITS 2.0, there is locale filter, see https://www.w3.org/TR/its20/#LocaleFilter-definition it provides a comma separated list of language ranges, which is probably what you want. e.g., if somebody specifies "en" as the desired linked langue, linked content with the language tag en-us would also be matched.

This example shows how to use the its-locale-filter attribute in HTML https://www.w3.org/TR/its20/#EX-locale-filter-locale-html5-1

For your use case, it may cause some issues to use its-locale-filter directly, since its-locale-filter is meant to specify the locale for content in the document and not linked content. Show you may define an attribute as a counter-part to its-locale-filter and use that attribute for linked content.

Btw., various people interested in Web technology and translation gather in the ITS interest group. https://lists.w3.org/Archives/Public/public-i18n-its-ig/ I will forward this issue to that list and encourage the people to give further feedback.

annevk commented 7 years ago

@domenic I think encouraging enabling feature detection makes sense (so don't implement if it doesn't do anything). I think counting addons/extensions is not something we should do. We'd turn into XForms quite fast. And I think we should give enough freedom to user agents to do what their users want. I'd expect most user agents to indicate the content was translated and give users the ability to undo or change the language and such. Since that doesn't seem clear to everyone we should make that clear in the specification.

galund commented 7 years ago

At the risk of 'piling on', I can't see how the author of one site (the content of which may not be one of the user's preferred languages anyway) can make any useful suggestion about whether or not some other site they linked to should be translated for the user.

All the information required about language preferences should be available to the user agent already, and this mechanism could only get in the way of either proper content negotiation or a good choice of automatic translation by the UA.

foolip commented 7 years ago

what criteria do you think we should use for spec inclusion here?

@domenic, I suppose I'd use the usual criteria of implementer interest, and also to discourage implementing any reflecting if the feature doesn't actually do anything. If that means we don't get enough implementer interest to put this in HTML, it could still be maintained elsewhere.

On the design itself, I don't have much useful to say. I am one of these users who sets all my UI to English even though it's not my primary language, and I do read some other languages without translating. I'm not in the target audience, as I normally disable auto-translation at the earliest opportunity.

r12a commented 7 years ago

@domenic, there's something i'm not clear about here:

In general, the translation produced by (2) [Link to a machine-translated version of the site] is poor, because it acts on the server-rendered page before any JavaScript gets a chance to execute. This can break the JavaScript and UI, in fact.

If we implement this attribute, wouldn't it have the same problem?

domenic commented 7 years ago

@r12a, no, because unlike machine translation tools that work on HTML source text, the browser is able to perform translation on the rendered DOM, after JavaScript executes on the original content of the page and using the user's credentials.

For an example of the credentials issue, consider a news site that only displays articles if you are a subscriber. A link like the one I gave would just render a translation of "please become a subscriber". But if you used the browser's translation feature after loading the original, un-translated page using the user's cookies, the contents of the article would be translated.

r12a commented 7 years ago

Ok, thanks @domenic. I'm still mulling over the use cases and implications (though i initially had similar reservations as other folks here).

However, one thing that seems clear to me is what @annevk said, but i'd word it more strongly, ie. if you click on a link and are presented with a translated page (a) the browser must make it clear that it has translated the page, and (b) the browser or the page must provide a way to view the original language instead. I agree that that needs to be spec'd as a requirement.

I'll contribute some more ideas after i have thought through the possibilities a bit more.

aphillips commented 7 years ago

The I18N WG discussed this in teleconference today, with an action to read and discuss again further next week.

I have a number of concerns or questions about the proposal.

  1. If a goal is for the hint to be conveyed to the requested page itself (in case it can language negotiate, for example), what mechanism is used? Accept-Language?

  2. If the goal is only to hint to the browser, shouldn't some of the options be "match the page language" (or at least the lang scope of the link) and "match my browser settings" (noting that, while many browsers are "set wrong", many are not. Mobile browsers tend to follow the OS setting). The more I think about this, the more I agree with @annevk 's comment above.

  3. If the problem (or part of the problem) is that Accept-Language is not set correctly, shouldn't we be working on making setting it properly a lot easier. I know the trend has been to bury these settings and not trouble users with them and that managing the language priority list, with it's potential for many tags and tag variations, might be a vexing UI problem, but A-L will never be useful if we don't take steps to fix it.

  4. If we're only talking about automatic translation at the browser level, presumably that has settings for the user to use. If I read a page in Hindi with auto-translation set to English and I click on a link in that page with a translation hint, I do not want to see the linked page in Hindi!

  5. Should there be scope for the hint? I don't want to laboriously type in a new attribute on every single link.

As with Richard, I need to think this through more. Likely the I18N WG will also have comments.

highdn commented 7 years ago

@aphillips, let me try to answer a few questions you raised.

Let's assume we have the following setup:

[pageA in languageA owned by webmasterA]
<a href="http://pageB" translateHint="languageHint">pageB</a>

[pageB in languageB owned by webmasterB]

Accept-Language approach assumes pageB can render itself in multiple languages. The translateHint feature should primarily be used with pageB that is not multilingual and/or doesn't have the content in a language that webmasterA prefers. In the other words, translateHint is a hint for the browser to trigger in-browser translation.

I think browser should ignore translateHint attribute if user configured any of the following:

Probably translateHint attribute should also be ignored by the browser if pageA itself was translated in the browser.

It seems totally reasonable to me have a special case when you want languageHint = languageA. For example:

अधिक जानकारी <a href="http://www.webmd.com/lung-cancer/default.htm" translateHint>फेफड़ों का कैंसर</a>

Thanks!

aphillips commented 7 years ago

@highdn Thanks.

The Accept-Language approach doesn't assume that the page can render itself, since most pages don't. What it would do is allow a high-quality (non-machine translated) version to be used for sites that do support language negotiation.

So what is the use case for webmasterA to introduce translateHint=languageNotA? How common is it that I (as a content author) will want to hint that the browser translate the results into a language other than that of the local page context?

highdn commented 7 years ago

@aphillips, I see your point about Accept-Language. Maybe browser should include the language from translateHint into the list of accepted languages when fetching pageB.

The tricky part here is what should take precedence: translateHint vs. browser's configured language preference. Browser's language preference is always there and is supposed to reflect user's preference. OTOH, we know that it is often misconfigured and doesn't reflect user's real preferences.

If it would be possible to distinguish when user actually configured browser's language vs. it's just the default (OS) language and user doesn't really know about this setting then we can use the precedence:

truly user configured language > language hint from a webmaster > default browser setting

Anyhow, majority of the pages don't really support Accept-Language feature and the end result will effectively be the same regardless of the Accept-Language header.


As for the translateHint != languageA case, it is possible when pageA is multilingual itself.

In India it is relatively common to have pages with content in multiple official languages, say in Hindi and Punjabi. In this case for links from Hindi part you may want translateHint="hi" and from Punjabi part you want translateHint="pa".

Also it's pretty common for Hindi speakers to set UI/interface language to English but the actual user content and user input will be in Hindi. In this case, effectively, pageA will be a mix of English and Hindi and the languageA can end up being English.

Thanks!

duerst commented 7 years ago

@highdn writes:

If you allow me to simplify your point, it is "the browser and the user should do the right thing". Unfortunately it is harder than you may think and any help there is much needed. This proposal adds a webmaster to the picture. E.g. it is the joint effort of the user, the webmaster and the browser to do the right thing.

Thinking from the point of view of a webmaster, I'd be pretty strongly annoyed to have to try to tell the user and the browser about the preferred translation language for a link. Even for multilingual pages, it shouldn't be too difficult for a browser to look at the context of a link, if it not already knows anyway that the reader prefers e.g. Hindi over Punjabi or so.

I tried to come up with actual examples of where translateHint could really be useful, to the extent that I as a webmaster/content creator would expend the effort to specify it. Here are the ones I came up with:

1) Our site received and confirmed reports that for the in-browser translation in browser X, some of the terminology on a site we link to is mistranslated into some offensive wording when translated into language C. We therefore better add translateHint to try to translate the linked page into some other language that we pray the user will be able to read.

2) Our site offers medical content, with many links to other medical content. We have a segment of our readership (let's say IT people) that prefers to read most of their content (e.g. about IT) in English, but prefer to read medical content in e.g. Hindi. We would like to offer these users a site-wide option "Prefer Hindi for translations of linked content", which we would then express with translateHint.

3) Most of our users share their browser with others (Internet café/

duerst commented 7 years ago

[continued from previous comment]

  1. Most of our users share their browser with others (Internet café/rented mobile phones scenario). Because our target audience is linguistically mixed, the browser will never get a good idea of user preferred target languages. But we can help the browser with translateHint.

Example 1. is contrieved and probably not worth it. Example 2 looks somewhat promising, but for a browser that can translate both IT and medical content, being able to distinguish user preferences for these two fields should be a piece of cake (remember that machine translation is an extremely hard problem). Also, in 3, the browser should be able to guess from context.

r12a commented 7 years ago

Ok, so here's what i'm thinking so far:

The function of the translationHint attribute, as i see it, is essentially to keep the reader viewing the page they link to in the same language as the content they are currently reading.

I’m worried about the content author deciding what language the page should be translated into. The choice depends on what the reader prefers, not what the content author decides, and it’s very possible that their preference is not the same as the language of the page they are currently reading. For example, they may land on a page in a language they don’t read very well, and be kept there because the content author assumed that they want to continue in the same language, whereas they would be better off if the choice of language suited the language(s) they can actually read.

Alternatively, the content author may not be able to guess the subtleties of a user’s preferences, nor will they necessarily know what pretranslated alternatives are available for the target page from day to day. For example, an older person from Hungary may speak better German or Russian than English, so if they are linking from a Hungarian page to another page which is only available in English, they may prefer to read it in German or Russian if such a content-negotiated translation is available, rather than getting a gisted page in Hungarian.

So really it’s a question of what the user’s preference is, linked with an awareness of what pretranslated resources are already available on the server. We need to factor in to this the fact that while native English speakers are highly likely to be monolingual, actually other people around the world are highly likely to be multilingual. And multilingual is very often more than bilingual.

Multilinguality may offer choices between local languages/dialects and standard languages, or between preferences for one language in a given context and another in other contexts. For example, people who prefer to read W3C specs in English may be motivated by a preference to read the original text because it is more authoritative or correct, or simply because they are used to working in a given language for discussions about a particular topic.

Also, i’m dubious that authors will be enthusiastic about having to specify the language explicitly for every link. If it does make sense to keep the reader in the same language when they link to the next page (see my comments above), then it would probably be better for the browser to automatically detect the language of the current content and send something with the request, or at least allow the content author to set a flag as high as possible in the document tree or use ITS to set the behaviour for a set of links.

The other problem, it seems to me, is that this only works seamlessly if a reader of page A links to the target document, page B, and that document also has translationHint attributes that keep the reader in the same language as they link onwards to page C – but wait! The content author who created page B did so in a different language, so they will have added a translationHint for that language, which is not what’s wanted if you want to keep the reader in the language of page A. Clang!

This doesn’t seem to work, because the language choice was not based on metadata about what the reader prefers, but rather tries to maintain continuity of one link to the next page based on the content.

There are other things that don’t bridge well in this scenario. Currently you can tell a browser with a translation service to never translate to a particular language, and that directive spans all the links you click on. The content author can’t forsee, or handle that situation. On the other hand, in-browser translation service preferences do – because they ask the user what to do.

It seems to me that much of what is needed is already taken care of if people are able to set their language preferences in the browser, and that making it as easy to see and change those preferences as it is to set preferences for auto-translation in Chrome, for example, would be a better way to approach the problem.

It would also address the question of how you are to pass information around, and allow for users to specify individual fallback preferences.

Effectively, then, a browser that takes you to a new page could check for existing translations (per normal content negotiation), and if that fails, check the language of the page itself and translate it to the user’s preferred language.

There are some limitations to using user preferences, of course. For example, they don’t help if you want to read all W3C docs in english, but read Facebook in Marathi. I don’t know of any way around that other than providing ‘sticky’ language selections per page or site (like we do for the i18n articles). This approach also doesn’t allow for a user to say “please don’t go and translate this page, even if it’s not in my list of preferred languages”.

To be honest, the approach that for example Chrome uses, where it learns from you your preferences, detects when a page is not in a language you may know when you land on it, and then offers you options at the point of reading, seems to actually work pretty well for the use cases described.

A possible alternative to all this suggests itself, but this is a very different kettle of fish. What if, while the user is reading the current page, the browser checked the language of each target page linked to from the current page. Then when the user right-clicks on a link, or even hovers over a link (these behaviours could be determined by the content author), and where the page at the end of the link isn’t in the same language or is available in alternative languages, the browser could pop up a list asking the user whether they want to (a) translate it (b) select a different language version of the page (where available), or (c) just carry on.

fsasaki commented 7 years ago

2017-08-30 10:16 GMT+02:00 Martin Dürst notifications@github.com:

@highdn https://github.com/highdn writes:

If you allow me to simplify your point, it is "the browser and the user should do the right thing". Unfortunately it is harder than you may think and any help there is much needed. This proposal adds a webmaster to the picture. E.g. it is the joint effort of the user, the webmaster and the browser to do the right thing.

Thinking from the point of view of a webmaster, I'd be pretty strongly annoyed to have to try to tell the user and the browser about the preferred translation language for a link. Even for multilingual pages, it shouldn't be too difficult for a browser to look at the context of a link, if it not already knows anyway that the reader prefers e.g. Hindi over Punjabi or so.

I tried to come up with actual examples of where translateHint could really be useful, to the extent that I as a webmaster/content creator would expend the effort to specify it. Here are the ones I came up with:

1.

Our site received and confirmed reports that for the in-browser translation in browser X, some of the terminology on a site we link to is mistranslated into some offensive wording when translated into language C. We therefore better add translateHint to try to translate the linked page into some other language that we pray the user will be able to read.

I very much like the terminology example, since it has the potential to relate cross lingual access and terminological resources. Imagine a company providing their multilingual glossary to their webmaster. The webmaster then can have the right usage of that glossary triggered by translateHint, e.g. to foster cross lingual product search.

This may sound abstract - to make this more concrete, here is a presenation I did with a co-presenter last year http://conferences.tekom.de/fileadmin/tx_doccon/slides/1486_Summit_Meeting_Search_Meets_Terminology.pdf and a demo how to generate such multilingual glossaries for web site embedding http://sasakiatcf.com/felix/tekom2016/

1.

Our site offers medical content, with many links to other medical content. We have a segment of our readership (let's say IT people) that prefers to read most of their content (e.g. about IT) in English, but prefer to read medical content in e.g. Hindi. We would like to offer these users a site-wide option "Prefer Hindi for translations of linked content", which we would then express with translateHint. 2.

Most of our users share their browser with others (Internet café/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/whatwg/html/issues/2945#issuecomment-325917621, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH5AgSBIs_-udYVnbrtlJcjw34_2YwXks5sdRpFgaJpZM4O8EWm .

asmusf commented 7 years ago

Martin and especially @r12a have articulated very nicely why I have been a skeptic of this proposal from the start.

aphillips commented 7 years ago

The I18N WG discussed this in our teleconference today and I drew the action to follow up. I regret that we kept horrible minutes for this part of the call.

Our comments are:

  1. We think that effort should be expended on fixing the problem of setting/managing a user's language preferences. Lack of support for this is consistently cited as a problem for language-related features on the Web, including this feature.

  2. It might be useful to specify or track when a language change occurs between the source page and the linked target ("Referrer-Language"??). This might be useful in language negotiation or automatic translation to have a sense of the user's context coming to a page as separate from the (unreliable, see above) Accept-Language

  3. We do not understand the benefit of this feature to page authors. The use cases for this need to be better enunciated before we could support it. WG members have suggested multiple cases in which the attribute would not generate the correct results. Our current shared understanding is that browsers that offer built-in translation have the information they need to offer users a translation choice or translation preferences when browsing pages normally. Since page authors can't know if the user has the capability, whether it is turned on, or what the customer's preferences are, they can't use translateHint to create links such as "see this page in Hindi"

@domenic Can you better spell out the use case for page authors?

domenic commented 7 years ago

@aphillips yeah, @highdn and I plan to work on a more full-fledged explainer repository over the next week or so. Thanks so much to you and the i18n WG for your attention to this proposal.

r12a commented 7 years ago

It might be useful to specify or track when a language change occurs between the source page and the linked target ("Referrer-Language"??). This might be useful in language negotiation or automatic translation to have a sense of the user's context coming to a page as separate from the (unreliable, see above) Accept-Language

Just to expand on that slightly: this would entail passing forward something that indicates the language of the content surrounding the link that was followed. (Surrounding to avoid problems when the link text language is different from that of the context.) The browser could then detect the language of the linked-to page as it loads it, and if there's a mismatch offer the user options for proceeding. Those options could be invoked after any content negotiation takes place, which will reduce the work of presenting the reader with the right language where the appropriate language is available on the server. I imagine the browser could then offer to translate the page into one of a list of languages including the language of the previous page, and the languages specified in the Accept-Language header (ie. the browser preferences).

dtapuska commented 6 years ago

I've written the more full-feldge explainer and put up a pull request for the proposal. https://github.com/dtapuska/html-translate

r12a commented 6 years ago

I don't think the explainer linked to above addresses the concerns raised in https://github.com/whatwg/html/issues/2945#issuecomment-326066250 or https://github.com/whatwg/html/issues/2945#issuecomment-326400336.

For example, the argument against use of the lang attribute ends with the sentence "This doesn’t necessarily mean that the user actually wants this URI translated." That's exactly the point we were trying to make about the hreftranslate proposal. It may well be that a Hindi reader is competent in English and actually prefers to read the original, rather than an (almost certainly) subpar translation into Hindi. It's not the content author's call as to which language gets presented to the user – that's the user's decision. (And such a decision can be facilitated, btw, by the UA if it spots that the hreflang of the destination is in a different language than either the current page or the user's browser settings, if the user is able to express their preferences.) There are other related problem scenarios described above with i'd like to see addressed in the explainer.

Also wrt the text of the PR:

  1. " user agents that support client side translation should display the page in the desired language" Could you point me to some UAs that apply client-side translation? I can't say i have knowingly come across one. (Or do you mean, that a user agent would package up the web page in a way that results in better translation when sent to an online service?) Given that the content author has provided these links, though, what's the likelihood that the reader who clicks on them has a client-side translation tool for the specific language pair the content author provided?

  2. The example used in the PR, where the content author has used explicit links to 'German version' and 'French version' are scenarios where this may make a little more sense, but it's a different scenario than the general one described elsewhere, but only where there is no language negotiation available on the server. (Although i think it's misleading to say "German version", when what is meant is "German gist translation".)

I'm sorry, but although i'm trying, i don't really see how this is supposed to be useful, or how it's supposed to work. I would very much like to see the previous concerns addressed point for point. The explainer is very short on details. And i can't get away from the idea that apart from very special cases (maybe this is missing from the explanation?) this autotranslation should be based on user preferences rather than mandated by the content developer.

dtapuska commented 6 years ago

I’m worried about the content author deciding what language the page should be translated into. The choice depends on what the reader prefers, not what the content author decides, and it’s very possible that their preference is not the same as the language of the page they are currently reading.

I believe you are missing an important point that the page serving up the page could likely just choose to point it to a translation service instead. The expected usage of this is something like:

var anchor = document.createElement('a');
if (HTMLAnchorElement.prototype.hasOwnProperty('hrefTranslate')) {
  anchor.href = 'https://r12a.github.io/pickers/';
  anchor.hrefTranslate = 'fr';
} else {
 anchor.href = 'https://www.microsofttranslator.com/bv.aspx?from=en&to=fr&a=https%3A%2F%2Fr12a.github.io%2Fpickers%2F'
}
anchor.innerHTML = 'Cueilleurs de caractère Unicode';
document.body.appendChild(anchor);

The page you are viewing influences what you will see in the next clicks because it presents them to you. This is just an alternate presentation surface. It should be able to choose however it wants to present those to you.

Then when the user right-clicks on a link, or even hovers over a link (these behaviours could be determined by the content author)

Neither of these actions are available where a predominately number of users are using the internet (on mobile devices in emerging markets). Yes there are alternate UX you could consider but the fact is that user's don't typically use additional context menu items.

To improve the fidelity if the translation service is available then the author chooses to use it as opposed to providing a hard reference to an online translation service. Yes the linked page will still have an opportunity to serve up a page that matches the language of the user. This is far better than hard referencing a language service because the service may not set the Accept-Language header correctly and not give the end page a chance to provide the correct language.

With respect to Addison's comments

  1. As stated in the explainer this is a problem irrespective of the language being set correctly.
  2. The Referrer's language might be mixed languages.
  3. I thought I tried to address this with the explainer articulating exactly the use case we are attempting to solve.