theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
109 stars 8 forks source link

Support multiple languages in content i18n localisation #1748

Open brendanheywood opened 9 years ago

brendanheywood commented 9 years ago

Just a brain dump for the long term. There are two parts to multi language support, 1 is the content side, the other is the application side. The content side is this issue, I'll make another issue for the application side. Forked from #1708 Mentioning @nicHoch for feedback

Note this is a full robust solution which may be completely impractical, but worth talking about as an ideal case before we think about what we'll actually do.

Principals: 1) Allow readers to specify a list of preferred languages in order, either explicitly in their account, or implicitly through their operating system and / or browser preferences 2) Allow authors to specify multiple versions in different languages 3) The system clearly shows what is available, and what is being shown 4) Go search engines each version should be separate: https://support.google.com/webmasters/answer/182192?hl=en#1

Some reading: http://alistapart.com/column/stars-and-stripes-and-iso-codes

Data model

Thoughts on editing

Thoughts on presentation

So we'd have: https://www.thecrag.com/climbing/spain/alto-mijares - english https://www.thecrag.com/escalada/spain/alto-mijares - spanish https://www.thecrag.com/klettern/spanien/alto-mijares - german

This also has the nice side effect of better seo for translated versions.

brendanheywood commented 8 years ago

So on reflection I think the idea about using the localised name for 'climbing' at the url base is probably going to create issues, in particular the obvious case where the word is the same in two languages, but also later for other urls which we may want which woldn't have 'climbing' as the base.

I think we just follow the rest of the world here and use the language codes (either 2 or 3 digit) to namespace each language version, ie

https://www.thecrag.com/en/climbing/spain/alto-mijares - english https://www.thecrag.com/sp/escalada/spain/alto-mijares - spanish https://www.thecrag.com/de/klettern/spanien/alto-mijares - german

We can still localise the url routes, and the url stubs which form the full urls too, but this is quite separate. This needs to be taken into account and will affect the algorithm for #1306

georg-d commented 4 years ago

Luckily, I did not find really relevant occurences of weight/distance/price/... units as well as number, date and time format in theCrag because it uses e.g. mostly relative time information like "7 days ago" instead of dates like 7/2/2020 - so IMHO we can reduce the i18n discussion to texts :)

Data model language, or language + localisation codes

I definitely vote for telling apart only language, not also localisation codes: I doubt we need the precision of language + localisation code. I doubt we have sufficient amount of users (=potential editors) to fill different localisation codes of the same language and to keep them in sync - hey, we're not even nearly having the major fields of all objects filled in any language, so it's already a quite ambitious goal that they're filled in all major langues like e.g. the 13 languages with >100 m speakers (wikipedia entry), and everything beyond is utpoic without automated edits.

Thoughts on editing if a country is configured with multiple primary languages, then we should show those in the bulk editing page so it's obvious what it missing etc

+1 to show boxes in all primary languages next to each other in edit mode, so users see at one glance what is "missing" or "out of sync" and thus have a low hurdle to help

Thoughts on presentation If only half the content is in lang A, and half is in B, and I can read both, should we render both mixed on one page? ... I think we avoid mixed language situations, this is a dirty data situation so need to help authors get this right

In read mode, I'd prefer to see as many filled fields in one view as possible - even if that means a wild language mix. IMHO this is much more usable than several language specific variants of one page, because each page variant offers only a fragment of the existing information and I will often not recall everything by heart, thus need to switch back & forth between page variants. This would also mean considerable loading times due to slow & shaky mobile coverage at the crag, respectively much jumping around in one huge repetitive PDF (first everything in EN with many empty descriptions, then everything in DE with many empty descriptions,...) or between several langue specific PDFs. IMHO, warnings of all languages shall be shown, because they are often transient (bird nest! loose rock!) and thus unlikely to be translated - yea, I consider Covid-19 a bold exception. IMHO, for each field, for each language, see what priority that language has in user's profile.

We shall show a language drop down next to each field so a) users know which language is currently shown (important if field is filled only with few words or abbreviations) and b) users can also easily switch to all other languages spoken by the user and having content for that field.

Thoughts on presentation if a person ticks a route X, and they are using lang A, and I am user B using lang B which do we show in feeds or facets?

Seems pretty clear for me; but maybe I am missing something. IMHO: System generated stuff (i.e. text around fields) in lang B, texts within fields in B if content available in B, elsewise (e.g. ascent comment written in only language A is unlikely to ever be translated) in A.

georg-d commented 3 years ago

During one year of using theCrag more intensely, I gaind the impression i18n is of higher importance & benefit in regions with climbers speaking many different mother tongues but not having one universal common language, e.g. in Europe, than it is e.g. in India or Australia where one common language is spoken by virtually all citizens and most guests.

To make it more tangible for e.g. our Australian co-climbers: If you're living around beautiful Lake Lucern, you'll speak German. Within roughly 1h30min drive to the West you're in French speaking region and in same distance to the south in Italian speaking region - but you may not know well enough French and Italian to understand crag/cliff/route descriptions (they do not exactly use the words you learned 20 years back at school and always planned to learn soon 😉), not to dream about writing in these languages. Similarily, more to the east in Vienna, you'll speak German and may drive for a weekend to Czech, Polish, Slowakian, Hungarian or Slowenian speaking regions, but there is no one lingua franca - you probably learned English and Italian or French, they probably learned Russian and one of the other neighbouring languages, maybe even German, but they usually do write in their mother tongue.

I assume "current state of languages" is one factor that slows down addition of information in theCrag in my wider surroundings: In the help I did not find any clear statement about how theCrag wants to deal with languages besides crag/route names - I'd expect it in etiquette and mission & vision, with one page just linking to the full description in the other. Mostly noone likes to read a text that is a wild mix of 4 langues, because you might not understand all languages well, it's difficult to switch back & forth between languages, it might not clear which language a word is written in thus it's meaning is ambigous, etc. Hence, if you cannot write in the same language already standing there, you're likely hesitent to edit/add - also, because the information you wanted to edit/add might already exist, but you simply don't get it due to lacking language skills. Moreover, if you're e.g. a French speaker and see theCrag in English UI, you're brain is in French+English context, so you're less likely to "of course" add German text than if you'd see that FR textbox is filled but DE textbox is empty. Data contribution is also not encouraged by the fact that other people write to you telling you shall not add content unless it is in the same language as the other existing text, or they request you not to write in another language than the one of the area, or the opposite, you shall not use the language of the area but some other language better known by visitors (which variies geartly by where visitors come from). Nope, none of this is made up, all is based on personal experience of only one year. In the meanwhile I developed the opinion that contributions in any language shall be welcomed and if someone is missing a language, he/she shall simply translate existing text to that language or start text in that language - and each language will have it's own paragraph with a language prefix like in the warning at Vieux Gueberschwihr > Carrière