kasei / perl-iri

Perl implementation of Internationalized Resource Identifiers (IRIs)
0 stars 6 forks source link

Parsing of some URL fails #11

Closed phochste closed 7 years ago

phochste commented 7 years ago

I'm processing the wikidata.dbpedia.org collections and come across IRI-s like http://hak.dbpedia.org/resource/Hàn_(𨧀) which can't be parsed. Any reason why not?

use IRI;
use utf8;

my $str = 'http://hak.dbpedia.org/resource/Hàn_(𨧀)';
my $iri = IRI->new(value => $str);
$ perl test.pl
Not a valid IRI? $VAR1 = "http://hak.dbpedia.org/resource/H\x{e0}n_(\x{289c0})";
kasei commented 7 years ago

I believe this is a typo bug in the code that causes problems parsing unicode characters in the path that are beyond the Basic Multilingual Plane. I'll try to get a fix committed and released quickly. Thanks for the report!

phochste commented 7 years ago

Thanks works for me!