Closed ilv closed 9 years ago
thanks ilv your analisis is good; it seems to me that it has only a case of failure that is the following
suppose the user locale is "es" , but right now TBB offers only es-ES, this way we will end providing en-US.
i've written the following algorithm that is like a charm on current Tor locales that are:
ar
de
en-US
es-ES
fa
fr
it
ko
nl
pl
pt-PT
ru
tr
vi
zh-CN
analyze it with the following inputs and tell me what you think:
ar -> will end providing ar
es -> will end providing es-ES
es-PT -> will end providing es-ES
es-ES -> will end providing ES
xx -> will end providing en-US
algorithm:
def getBestLangMatch(accept_language, supported_lcs):
def parse_accept_language(accept_language):
return [l.split(';')[0] for l in accept_language.replace(" ", "").split(',')]
def language_only(lc):
if '-' in lc:
lc = lc.split('-')[0]
return lc
for lc in parse_accept_language(accept_language):
# returns es-PT if es-PT is available (perfect match)
for l in supported_lcs:
if lc.lower() == l.lower():
return l
lc = language_only(lc)
# returns es if asking for es-PT with
# es-PT not available but es available
for l in supported_lcs:
if lc.lower() == l.lower():
return l
# returns es-ES if asking for es-PT with
# es-PT and es not available but es-ES available
for l in supported_lcs:
if lc.lower() == language_only(l).lower():
return l
return 'en-US' # last resort
what do you think?
@ilv i've written a demo of the algorithm above to prove it agaist the cases i'm expecting: https://gist.github.com/evilaliv3/5a9cd11eaa0cf60da425
any comment?
Great @evilaliv3, it seems that you have covered all the cases :) The algorithm works quite well, I've tested it with some extra inputs and all is good. Given the few locales supported by Tor Browser I think this will work perfectly fine.
Regarding #168, I think we can follow this and this, specially this idea:
"The basic rule here is that if your language preference list contains a language tag containing a hyphen, such as fr-CH (French as spoken in Switzerland), you should consider adding an additional language tag without the hyphen, ie. fr (French) in this case, immediately after."
So, a basic algorithm for this could be as follows:
For this purpose we should make some sort of mapping for en, es, pt, and zh to en-US, es-ES, pt-PT and zh-CN respectively. For instance:
This covers the case of a browser that is not configured properly e.g. have pt-BR and pt as preferred languages.
Thoughts? @fpietrosanti @evilaliv3