Closed kelunik closed 6 years ago
@kelunik
idn_to_ascii
decodes per label because it's a legacy from URI v4 which used to work with an idn polyfill. Maybe we can remove this and decode the full host with it instead.
@nyamsprod What I just noticed: Name validation isn't done in that method, what should happen with an invalid host name there? Just ignore and accept?
name validation is done using the is_host function from the parser component. But I do agree with you if we simplify the formatHost method by delegating the validation to idn_to_ascii then formatHost will format and validate the host accordingly. I've got a POC on my local dev machine. I'll make a PR hopefully tomorrow you will be able to review 👍
@kelunik look at #7 and tell me what you think ? If it's ok, I'll update the uri-parser too accordingly.
Introduction
ASCII-only domains are way more common than IDNs. With a similar change in the parser check, this gives a 35% CPU time reduction in the included benchmark according to Blackfire.
Backward Incompatible Changes
None.
Targeted release version
Next patch release.
PR Impact
None.
Open issues
I'm not sure why
idn_to_ascii
and the decoding are applied per label instead of directly to the host name, could anyone shed some light here? Also, themb_strtolower
is probably unnecessary and can be replaced withstrtolower
.The
(string)
cast foridn_to_ascii
seems strange, that will just empty all invalid labels and then contain two consecutive dots in the host?