mozilla / addons

☂ Umbrella repository for Mozilla Addons ✨
Other
125 stars 41 forks source link

include localized urls in the sitemap #8287

Closed eviljeff closed 3 years ago

eviljeff commented 3 years ago

followup for mozilla/addons#8201

We currently generate all pages for the sitemap as /en-US/ but we should have localized urls too.

There is some level of support in this already in the django sitemaps app but in django2.2 it's a bit too naive - it doesn't support the alternatives syntax to group the urls together, and mis-paginates them meaning we'd end up with more than 50,000 urls per file (the google search limit) unless we manually calculate how much lower it should be (50000/<however many languages we support>). We need the django3.2 implementation. If we really need this before we're ready to upgrade we can copy over the code from django3.2 to support django2.2 (it's not that much)

Other comments:

eviljeff commented 3 years ago

ref for canonical/alternate url discussion https://github.com/mozilla/addons/issues/12540

AlexandraMoga commented 3 years ago

@eviljeff I've noticed two things since this issue was closed:

eviljeff commented 3 years ago
* the sitemap https://addons-dev.allizom.org/sitemap.xml?debug has become super slow - it took for me ~ 2 minutes to load

I've logged mozilla/addons#8341 to address this - the way django iterates through the locales is inefficient

* sitemap sections are no longer organized as an `.xml`? For example, when I'm loading https://addons-dev.allizom.org/sitemap.xml?section=amo&debug, I'm seeing only a sequence of URLs.

it's still valid sitemap xml (at least from the validators I could find on the web), but Firefox doesn't seem to like the extra xhtml tags (or gets confused) and tries to render it as html.

AlexandraMoga commented 3 years ago

it's still valid sitemap xml (at least from the validators I could find on the web), but Firefox doesn't seem to like the extra xhtml tags (or gets confused) and tries to render it as html.

so does Chrome too, which makes sections even harder to analyze now.

eviljeff commented 3 years ago

yep. But that's how it's supposed to be implemented, apparently https://developers.google.com/search/docs/advanced/crawling/localized-versions#expandable-3

AlexandraMoga commented 3 years ago

So, I was able to see the 10 localizations served by the sitemap for each page in this alphabetical order: de, en-GB, en-US, es, fr, ja, pl, pt-BR, ru, zh-CN.

@eviljeff has the following comment ended up being implemented and, if so, how can this be verified?

we should prioritize/default to the addon's default locale, rather than en-US

eviljeff commented 3 years ago

@eviljeff has the following comment ended up being implemented and, if so, how can this be verified?

we should prioritize/default to the addon's default locale, rather than en-US

No, there wasn't a way to define it per add-on in the django implementation so I left off the default url entirely. It can be implemented in a follow-up if it's deemed worth the extra hassle.