ingydotnet / yaml-libyaml-pm

Perl Binding to libyaml
http://search.cpan.org/dist/YAML-LibYAML/
33 stars 37 forks source link

round trip grows regexp #69

Closed boldra closed 6 years ago

boldra commented 6 years ago

Test program

perl -MYAML::XS -E 'my $y=Load($ARGV[0]); for(1..10) { $y=Load(Dump($y)) }; say Dump($y)' 'match: !!perl/regexp (?i-xsm:OK)'

input

match: !!perl/regexp (?i-xsm:OK)

output

match: !!perl/regexp (?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^u:(?^ui:OK)))))))))))

(expected output to be same as input). Perl 5.24.3/Perl 5.22.4 YAML::XS 0.66

perlpunk commented 6 years ago

Unfortunately you can see this effect also simply with this code:

my $regex = qr{OK};
say $regex;
my $str = "$regex";
$regex = qr{$str};
say $regex;
__END__
(?^u:OK)
(?^u:(?^u:OK))

I remember Data::Dumper had the same problem years ago, so I sent a bugreport. We could look how it was solved.

I see no trivial way to solve this. One could look at the output of qr{} and try to remove that everytime when stringyfying a regex.

perlpunk commented 6 years ago

Like this:

my $string = "(?i-xsm:OK)";
my ($pre, $post) = split m/PLACEHOLDER/, qr{PLACEHOLDER};
my $re;
for (1..5) {
    if ($string =~ s/^\Q$pre//) {  
        $string =~ s/\Q$post\E$//;
    }
    $re = qr{$string};
    say $re;
}
say $re;
__END__
(?^u:(?i-xsm:OK))

Only in XS instead...

perlpunk commented 6 years ago

This is how YAML.pm does it: https://github.com/ingydotnet/yaml-pm/blob/master/lib/YAML/Types.pm#L196

perlpunk commented 6 years ago

actually, YAML::XS also has code for it: https://github.com/ingydotnet/yaml-libyaml-pm/blob/master/lib/YAML/XS.pm#L99

boldra commented 6 years ago

Anton Petrusevich just suggested this diff:


--- lib/YAML/XS.pm        2017-11-15 18:59:43.000000000 +0100

+++ /home/anton/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/YAML/XS.pm     2017-12-05 15:25:21.300019521 +0100

@@ -116,6 +116,7 @@

};

 sub __qr_loader {

+    1 while $_[0] =~  s/\A(\(\?\^u?:)(.*)\)\z/$2/;

     if ($_[0] =~ /\A  \(\?  ([ixsm]*)  (?:-  (?:[ixsm]*))?  : (.*) \)  \z/x) {

         my $sub = _QR_MAP->{$1} || _QR_MAP->{''};

         &$sub($2);

It certainly does the job for us, but I'm not sure whether all the superfluous quoting is going to start with the ^u?

perlpunk commented 6 years ago

I will have a look soon. In YAML.pm it works, so maybe we can get it working for YAML::XS too.

perlpunk commented 6 years ago

I made a quick fix for this in PR #70

perlpunk commented 6 years ago

btw, YAML.pm has the same problem because it also does not check the u:

use YAML;
use Encode;
my $yaml = decode_utf8 q{re : !!perl/regexp OK}; 
for (1..5) {
  $data = Load $yaml;
  $yaml = Dump $data;
}          
say $yaml;
__END__
---                    
re: !!perl/regexp (?^u:(?^u:(?^u:(?^u:(?^u:OK)))))
perlpunk commented 6 years ago

Thanks, released YAML::LibYAML 0.67_001

perlpunk commented 6 years ago

Closing, created issue #72 for improving regex handling