thephpleague / uri-schemes

Collection of URI Immutable Value Objects
https://uri.thephpleague.com/schemes/
MIT License
216 stars 7 forks source link

Optimize formatHost for ASCII-only domains #6

Closed kelunik closed 6 years ago

kelunik commented 6 years ago

Introduction

ASCII-only domains are way more common than IDNs. With a similar change in the parser check, this gives a 35% CPU time reduction in the included benchmark according to Blackfire.

Backward Incompatible Changes

None.

Targeted release version

Next patch release.

PR Impact

None.

Open issues

I'm not sure why idn_to_ascii and the decoding are applied per label instead of directly to the host name, could anyone shed some light here? Also, the mb_strtolower is probably unnecessary and can be replaced with strtolower.

The (string) cast for idn_to_ascii seems strange, that will just empty all invalid labels and then contain two consecutive dots in the host?

nyamsprod commented 6 years ago

@kelunik

idn_to_ascii decodes per label because it's a legacy from URI v4 which used to work with an idn polyfill. Maybe we can remove this and decode the full host with it instead.

kelunik commented 6 years ago

@nyamsprod What I just noticed: Name validation isn't done in that method, what should happen with an invalid host name there? Just ignore and accept?

nyamsprod commented 6 years ago

name validation is done using the is_host function from the parser component. But I do agree with you if we simplify the formatHost method by delegating the validation to idn_to_ascii then formatHost will format and validate the host accordingly. I've got a POC on my local dev machine. I'll make a PR hopefully tomorrow you will be able to review 👍

nyamsprod commented 6 years ago

@kelunik look at #7 and tell me what you think ? If it's ok, I'll update the uri-parser too accordingly.