pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 952 forks source link

Various pages are localized into Chinese if you navigate via the release history. #12445

Closed glyph closed 1 year ago

glyph commented 1 year ago

Describe the bug

image image image

Expected behavior

I expect these pages to be in English.

To Reproduce

  1. Navigate to a project, i.e. https://pypi.org/project/requests/
  2. Click "release history"
  3. Click on a few different releases until the page turns chinese. Not all versions seem to do this. I have had some success with https://pypi.org/project/requests/2.28.0/ and https://pypi.org/project/Twisted/22.10.0rc1/ and I cannot get it to happen with any version of https://pypi.org/project/textual/.
  4. Now the page will stay in chinese until you navigate from somewhere else, e.g. go to the search page and search for Twisted or Requests and click through from there.

My Platform

On macOS, tested on recent versions of Chrome and Safari, tested both logged in and logged out

Additional context

uranusjr commented 1 year ago

Also it can switch between any language except English. The bug is probably in the localisation code that redirects you to the “first” localised language (Chinest Simplified) when it should show the unlocalised content (English) instead.

ewdurbin commented 1 year ago

This is almost certainly a regression due to #6864.

ewdurbin commented 1 year ago

Curious if for those able to reproduce: If you select a language explicitly in the switcher, does the correct translation come up?

wouterel commented 1 year ago

I found that this can happen even when not going through the release history. Just going straight to this recently updated package will give an automatic switch to Chinese. The ">" indicator in the language selector also points at Chinese, even after clicking "English" there explicitly. Tried on several devices and browsers.

jenca-adam commented 1 year ago

Does this happen only on recently updated packages?

ewdurbin commented 1 year ago

As far as I've been able to find, the issue is only happening when the en identifier is selected either by default or via cookie, and the translation being served is zh_Hans. This may indicate that something in our handling of Accept-Language headers is going haywire and setting the caching header PyPI-Locale to en when an Accept-Language header is negotiated to zh-Hans.

wouterel commented 1 year ago

By the way the link I posted to the plain package page without version number has the language problem, but going to the latest version by explicitly specifying it in the URL works just fine: https://pypi.org/project/perconet/0.2.0/

ewdurbin commented 1 year ago

Current theory is that the way we handle Accept-Language in VCL:

https://github.com/pypi/infra/blob/main/terraform/warehouse/vcl/main.vcl#L151

Differs in how our backends negotiate Accept-Language:

https://github.com/pypi/warehouse/blob/090575fd719fbce7e048656de27a9fb6d3bf3c34/warehouse/i18n/__init__.py#L92-L97

It seems to me that there may be a case where a common Accept-Language value is determined by our VCL to be en, then our backends decide it should be zh-Hans. Thus the requests is cached with PyPI-Locale: en but the response is in another language.

I need to investigate the differences in how accept.language_filter_basic (VCL), accept.language_lookup (VCL), and request.accept_language.best_match handle a given string.

One seemingly decent option is just to normalize them in VCL by setting the Accept-Language on the request to whatever accept.language_lookup (VCL) chooses.

ewdurbin commented 1 year ago

OK, fix in https://github.com/pypi/infra/pull/110 is live and a purge has been completed for all-html.

I think this should resolve the issue, please comment if more instances are detected.

glyph commented 1 year ago

Thanks for the rapid response. I can confirm that I'm seeing PyPI's UI entirely in english now, as expected.