jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Add ability for control IDNA options for idn_to_(utf8|ascii) function #234

Closed Insolita closed 5 years ago

Insolita commented 5 years ago

Issue summary

The trouble happens when i try convert domains with some specific characters called "deviations" in my case is ß (more information in http://unicode.org/reports/tr46/#Transition_Considerations)

System informations

Information Description
Pdp version 5.4
PHP version 7.3
OS Platform Archlinux

Standalone code, or other way to reproduce the problem


$domainName = 'faß.de';
$domain = new \Pdp\Domain($domainName);
echo $domain->getContent();
//fass.de 
echo $domain->toUnicode()->getContent();
//fass.de 
echo $domain->toAscii()->getContent();
//fass.de 

Expected result

In my case i expect to get "faß.de" as unicode and xn--fa-hia.de as ascii This happens because function idn_to_utf8 https://github.com/jeremykendall/php-domain-parser/blob/develop/src/IDNAConverterTrait.php#L149 and idn_to_acii https://github.com/jeremykendall/php-domain-parser/blob/develop/src/IDNAConverterTrait.php#L118 used with option = 0;

But for expected result option IDNA_NONTRANSITIONAL_TO_UNICODE (32) required for idn_to_utf8 and IDNA_NONTRANSITIONAL_TO_ASCII for idn_to_acii https://www.php.net/manual/en/intl.constants.php

       $domainName = 'faß.de';
        $info  = [];
        $utf1 = idn_to_utf8($domainName, IDNA_DEFAULT , INTL_IDNA_VARIANT_UTS46, $info);
        $utf2 = idn_to_utf8($domainName, IDNA_NONTRANSITIONAL_TO_UNICODE , INTL_IDNA_VARIANT_UTS46);
        expect($info['isTransitionalDifferent'])->true();
        expect($utf1)->equals('fass.de');
        expect($utf2)->equals('faß.de');

        $ascii1 = idn_to_ascii($domainName, IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46, $info);
        $ascii2 = idn_to_ascii($domainName, IDNA_NONTRANSITIONAL_TO_ASCII, INTL_IDNA_VARIANT_UTS46);

        expect($ascii1)->equals('fass.de');
        expect($ascii2)->equals('xn--fa-hia.de');

So i want to have ability to set my custom option for idn convertation. I can make PR if we agree on how to pass this option. Or it can be used as default; with IDNA_DEFAULT option - idn_to_ascii for fass.de and faß.de - are same and it can provide some unexpected behavior Also it will be good to have method for checking is domainName contains deviation characters

nyamsprod commented 5 years ago

@Insolita from reading your issue I have to wonder if adding a format method to the Domain VO which accepts IDNA_* constant would not be the correct way to solve it 🤔

Insolita commented 5 years ago

@nyamsprod Is will be not enough to add it only into Domain VO. It should be accepted also in each class that can create Domain instance or idnToAscii/idnToUnicode methods from IDNAConverterTrait

nyamsprod commented 5 years ago

I was thinking about the DomainInterface 😉 so yes it will be implemented on the PublicSuffix VO as well. If you can provide a PR for it I'll gladly review it 🎉