m6w6 / ext-http

Extended HTTP Support
BSD 2-Clause "Simplified" License
79 stars 22 forks source link

Underscores in host names: libidn Failed to parse IDN; http\Exception\RuntimeException: http\Client::enqueue() #52

Closed canavan closed 8 years ago

canavan commented 8 years ago

If ext-http is built with libidn, URLs with "_" (undescore) in the hostname cannot be loaded due to a libidn parse failure. The code below logs the error messages given underneath:

$client = new http\Client();
$client->enqueue(new http\Client\Request("GET", "http://local_host:80/"));
$client->send();
Warning: http\Client\Request::__construct(): Failed to parse IDN; Non-digit/letter/hyphen in input in pecl_http-3.1.0beta1/tests/client030.phpt on line 12

Fatal error: Uncaught http\Exception\RuntimeException: http\Client::enqueue(): Cannot request empty URL in pecl_http-3.1.0beta1/tests/client030.phpt:12
Stack trace:
#0 pecl_http-3.1.0beta1/tests/client030.phpt(12): http\Client->enqueue(Object(http\Client\Request))
#1 {main}
  thrown in pecl_http-3.1.0beta1/tests/client030.phpt on line 12

URLs containing _ can be loaded if ext-http is not built with libidn support. The curl command line client or browsers (such as Firefox or Chrome) also handle such URLs.

Jan-E commented 8 years ago

https://www.quora.com/Why-are-underscores-not-allowed-in-DNS-host-names

m6w6 commented 8 years ago

Yes. The "workaround" is to use an explicit http\Url instance.

canavan commented 8 years ago

Aside from the fact that one should not use _ or any other other fancy characters in hostnames, https://www.ietf.org/rfc/rfc2181.txt contradicts the quora posting in no uncertain terms:

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators). The zero length full name is defined as representing the root of the DNS tree, and is typically written and displayed as ".". Those restrictions aside, any binary string whatever can be used as the label of any resource record. Similarly, any binary string can serve as the value of any record that includes a domain name as some or all of its value (SOA, NS, MX, PTR, CNAME, and any others that may be added). Implementations of the DNS protocols must not place any restrictions on the labels that can be used.

However, I'd also consider this issue fixed if ext-http behaved the same with libidn and without and reject hostnames with underscore in both cases.

m6w6 commented 8 years ago

Sure, but if you enjoy reading specs, have a look at these:

We could drop STD3_ASCII_RULES usage with libidn1, but that would seem overly committed to a traditional "just don't do that".

BTW, your quote is jut generically referring to labels in a domain name, the rules for host names are historically more peculiar.

rcanavan commented 8 years ago

I don't really enjoy reading specs. but among the ones you referred to, https://tools.ietf.org/html/rfc3986#section-3.2.2 is the one that most likely applies here, and it has

host        = IP-literal / IPv4address / reg-name
reg-name    = *( unreserved / pct-encoded / sub-delims )
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

BTW, what exactly are the benefits of building ext-http with libidn if curl is already built with libidn support? My problem would also go away if i could configure / build ext-http without libidn, eben if libidn-dev (with headers and libidn.so) is installed.

m6w6 commented 8 years ago

I'm already working on a possible fix, sorry for the delay.

rcanavan commented 8 years ago

I've tested the patch and can confirm that it's working. Thanks.