Open tofumatt opened 1 year ago
@eugene-manuilov and @aaemnnosttv: I wrote up some ACs here that I think cover the approach we talked about on last week's call. Can either/both of you review it and let me know if that covers everything?
Thanks @tofumatt – I think this is a great start but there may be a few more things to consider
A few thoughts:
URL->equals()
method which could take a string or another URL instance to perform the equivalence check internally@eugene-manuilov Are you still planning to review this soon? If not, can you please unassign yourself so someone else can pick up? Thank you!
I have nothing to add to Evan's feedback. Assigning back to @tofumatt to address it.
Bug Description
Site Kit should allow users to set up Site Kit when they have an IDN, eg. a domain with Unicode/non-ASCII characters.
We support this in theory, but it seems like our implementation (along—potentially—with some of WordPress's handling of IDNs) is inconsistent, leading to minor bugs like failing to redirect properly (#5868) and possibly failing to create an Analytics account via Site Kit.
IDNs can be presented in two different ways: either in their "original"/"raw" format, eg.
türkish.com
, or in Punycode eg.xn--trkish-3ya.com
. While WordPress, the Site Kit service/proxy, and parts of our plugin can handle either, it seems we aren't entirely consistent about how we present them to users or how we submit the domain name/site URL to Google Services like Analytics.For instance, here's a site with the
siteurl
türkish.com
, connecting an Analytics account. Note the domain detected/presented is Punycode and that this fails to set up an Analytics account:https://user-images.githubusercontent.com/90871/213430033-5c8adabb-b044-471d-9e4b-9c05f4694ea9.mp4
Here's the same site after encountering that failed set up, when the domain is manually substituted for the Unicode one:
https://user-images.githubusercontent.com/90871/213430312-24acd253-3f7e-403a-b8a1-f33a1a86df19.mp4
If we set the
siteurl
to a Punycode string instead of a Unicode one, note that the Punycode Analytics account creation works just fine:https://user-images.githubusercontent.com/90871/213432130-dbec3705-9f29-4cd4-b421-ee649679a286.mp4
It's worth noting that because we treat the Unicode and Punycode domains as different, there's also this odd error users can encounter when changing between the Unicode and non-Unicode
siteurl
:Arguably that isn't really true, and we should probably consistently treat all IDNs as either Unicode or Punycode in our codebase and always present them as Unicode in the UI.
Do not alter or remove anything below. The following sections will be managed by moderators only.
Acceptance criteria
URL
class should be extended to allow creation ofURL
instances which hold a WordPress/site URL (or any other URL) created from a string of either ASCII, punycode, or Unicode characters.google.com
, the string would be stored as-is. If the string is Unicode (türkish.com
), it should be converted to a punycode string (xn--trkish-3ya.com
).URL
instance and using the correct output method based on the requirements. In PHP code, most usage is internal and not part of user-facing UI, so this will almost always be the$url->internal()
format.normalizeURL( url )
type utility that normalises the URL into a punycode version.This change of "always using punycode internally" should not pose issues for users who "registered Site Kit" when the internal URL was Unicode and not punycode, because the Site ID is the relevant link between the plugin and Site Kit service.
However, comparing the "internal", punycode domain with existing Analytics properties (or other service's properties) may cause issues where an existing Analytics property exists but for a Unicode version of our "internal" punycode domain.
Implementation Brief
Test Coverage
QA Brief
Changelog entry