google / site-kit-wp

Site Kit is a one-stop solution for WordPress users to use everything Google has to offer to make them successful on the web.
https://sitekit.withgoogle.com
Apache License 2.0
1.23k stars 279 forks source link

International Domain Names (IDNs) are not presented/handled consistently across the plugin #6435

Open tofumatt opened 1 year ago

tofumatt commented 1 year ago

Bug Description

Site Kit should allow users to set up Site Kit when they have an IDN, eg. a domain with Unicode/non-ASCII characters.

We support this in theory, but it seems like our implementation (along—potentially—with some of WordPress's handling of IDNs) is inconsistent, leading to minor bugs like failing to redirect properly (#5868) and possibly failing to create an Analytics account via Site Kit.

IDNs can be presented in two different ways: either in their "original"/"raw" format, eg. türkish.com, or in Punycode eg. xn--trkish-3ya.com. While WordPress, the Site Kit service/proxy, and parts of our plugin can handle either, it seems we aren't entirely consistent about how we present them to users or how we submit the domain name/site URL to Google Services like Analytics.

For instance, here's a site with the siteurl türkish.com, connecting an Analytics account. Note the domain detected/presented is Punycode and that this fails to set up an Analytics account:

https://user-images.githubusercontent.com/90871/213430033-5c8adabb-b044-471d-9e4b-9c05f4694ea9.mp4

Here's the same site after encountering that failed set up, when the domain is manually substituted for the Unicode one:

https://user-images.githubusercontent.com/90871/213430312-24acd253-3f7e-403a-b8a1-f33a1a86df19.mp4

If we set the siteurl to a Punycode string instead of a Unicode one, note that the Punycode Analytics account creation works just fine:

https://user-images.githubusercontent.com/90871/213432130-dbec3705-9f29-4cd4-b421-ee649679a286.mp4


It's worth noting that because we treat the Unicode and Punycode domains as different, there's also this odd error users can encounter when changing between the Unicode and non-Unicode siteurl:

CleanShot 2023-01-19 at 11 20 10

Arguably that isn't really true, and we should probably consistently treat all IDNs as either Unicode or Punycode in our codebase and always present them as Unicode in the UI.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

CleanShot 2023-02-23 at 20 49 57

This change of "always using punycode internally" should not pose issues for users who "registered Site Kit" when the internal URL was Unicode and not punycode, because the Site ID is the relevant link between the plugin and Site Kit service.

However, comparing the "internal", punycode domain with existing Analytics properties (or other service's properties) may cause issues where an existing Analytics property exists but for a Unicode version of our "internal" punycode domain.

Implementation Brief

Test Coverage

QA Brief

Changelog entry

tofumatt commented 1 year ago

@eugene-manuilov and @aaemnnosttv: I wrote up some ACs here that I think cover the approach we talked about on last week's call. Can either/both of you review it and let me know if that covers everything?

aaemnnosttv commented 1 year ago

Thanks @tofumatt – I think this is a great start but there may be a few more things to consider

A few thoughts:

ivonac4 commented 8 months ago

@eugene-manuilov Are you still planning to review this soon? If not, can you please unassign yourself so someone else can pick up? Thank you!

eugene-manuilov commented 8 months ago

I have nothing to add to Evan's feedback. Assigning back to @tofumatt to address it.