ingydotnet / yaml-libyaml-pm

Perl Binding to libyaml
http://search.cpan.org/dist/YAML-LibYAML/
33 stars 37 forks source link

YAML::XS causes core dump on binary data #91

Closed pmorch closed 5 years ago

pmorch commented 5 years ago

SNMP.pm returns binary data which I try to storeToFile(Dump($binary)).

I can expand that binary data to a more full SNMP example but this is a minimal test case that causes perl to dump with a core file (if enabled) because of YAML::XS:

use YAML::XS;
my $hex = '7e00048800';
my $dyingValue = pack('H*', $hex);
print Dump({ key => $dyingValue});

When run:

# cpanm YAML::XS
<snip>
Successfully installed YAML-LibYAML-0.77 (upgraded from 0.63)
1 distribution installed

# perl -MYAML::XS -E 'say $YAML::XS::VERSION'
0.77

# perl ./small.pl 
*** Error in `perl': double free or corruption (fasttop): 0x00005603ce735fe0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f512daf6bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f512dafcfc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f512dafd80e]
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1/auto/YAML/XS/LibYAML/LibYAML.so(yaml_event_delete+0x82)[0x7f512d630c32]
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1/auto/YAML/XS/LibYAML/LibYAML.so(yaml_emitter_emit+0x24b)[0x7f512d63714b]
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1/auto/YAML/XS/LibYAML/LibYAML.so(dump_hash+0x1ea)[0x7f512d63ffba]
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1/auto/YAML/XS/LibYAML/LibYAML.so(dump_node+0x1e3)[0x7f512d63f903]
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1/auto/YAML/XS/LibYAML/LibYAML.so(dump_document+0x3d)[0x7f512d63fabd]
/usr/local/lib/x86_64-linux-gnu/perl/5.24.1/auto/YAML/XS/LibYAML/LibYAML.so(Dump+0x200)[0x7f512d63fcf0]
perl(Perl_pp_entersub+0x450)[0x5603cd12d560]
perl(Perl_runops_standard+0x16)[0x5603cd125aa6]
perl(perl_run+0x319)[0x5603cd0ab8a9]
perl(main+0x13d)[0x5603cd084a5d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f512daa62e1]
perl(_start+0x2a)[0x5603cd084a9a]
======= Memory map: ========
<snip>

Whatever data is given to a perl module, it should not cause an un-eval-able core dump.

perlpunk commented 5 years ago

Thanks! Reproduced and fixed.

Here are 4 minimal test cases that show the problem:

    "~\0\200",
    "null\0\200",
    "true\0\200",
    "false\0\200",

In dump_scalar(), there are some strEQ() calls:

           strEQ(string, "~") ||
           strEQ(string, "true") ||
           strEQ(string, "false") ||
           strEQ(string, "null") ||

and strEQ stops at null bytes. I'm still not sure what exactly is going wrong on libyaml side so that the error can't be catched, but I fixed the above code:

           (string_len == 1 && strEQ(string, "~")) ||
           (string_len == 4 && strEQ(string, "true")) ||
           (string_len == 5 && strEQ(string, "false")) ||
           (string_len == 4 && strEQ(string, "null")) ||

I will release a developer version soon.

perlpunk commented 5 years ago

I changed the title, because the problem here is binary data.

YAML does not support binary data. All data must be unicode. So even if this bug is fixed, don't expect to get the correct data again if you reload it after a dump.

perlpunk commented 5 years ago

I released YAML-LibYAML-0.77_001.tar.gz

perlpunk commented 5 years ago

If you want to correctly handle binary data, you might want to look at YAML::PP, especially https://metacpan.org/pod/YAML::PP::Schema::Binary

Of course that's slower than YAML::XS because it's pureperl.

If you want to get part of the speed back, try replacing YAML::PP with YAML::PP::LibYAML (which uses YAML::LibYAML::API as a backend). Note that you need to have the yaml-dev library installed currently for that to work. I will include that in a future release of YAML::LibYAML::API.

perlpunk commented 5 years ago

For the record: The bug was introduced in version 0.59 872d0182b4142ccba0e7f09be04db4d9697cc24f

edit: or actually, it made the bug active. The wrong handling with strEQ existed before already.

pmorch commented 5 years ago

Thank you so much for your quick patch, @perlpunk!

You write:

YAML does not support binary data. All data must be unicode. So even if this bug is fixed, don't expect to get the correct data again if you reload it after a dump.

However my testing shows that with with LibYAML-0.77_001 at least, there are no more problems with binary data (see below for test case).

Can you imagine a test case where ! is_deeply(Load(Dump($value)), $value) using YAML::XS?

We really like YAML::XS for our general-purpose serialization format, and we'd loathe to have to have to use e.g. base64 encoding of Storable's freeze/thaw, as it makes debugging more difficult, but need to ensure we get the original (non-XS) data after a Dump/Load (or freeze/thaw or encode_json/decode_json or whatever) cycle.

You recommend YAML::PP, and I just briefly checked it out, but didn't like the warning:

WARNING: This is not yet stable.

I know this is off-topic but can you recommend a better serialization format that:

Binary data YAML::XS test case

This test passes:

#!/usr/bin/perl
use warnings;
use strict;
use YAML::XS;
use Test::More;

sub generateRandomBinaryString {
    return join('', map { chr(int(rand(256))) } 1..1000);
}

for my $iteration (1..1e6) {
    my $value = { key => generateRandomBinaryString()};
    is_deeply(
        Load(Dump($value)), $value,
        'Load(Dump) is transparent: ' . $iteration
    );
}

done_testing();
perlpunk commented 5 years ago

@pmorch thanks! Actually you might be right. Apparently YAML::XS upgrades binary strings to utf8, and then when loading you will get back a utf8-decoded string, and if you compare this to your original binary string, it will downgrade automatically for the comparison. You should probably note that you get back a utf8-decoded string, but if you know when to handle this, then everything should be alright. I was unsure if there is any possibility for this approach to go wrong.

use Devel::Peek;
use YAML::XS ();
my $binary = "\342\202\254";
Dump $binary;
my $dump = YAML::XS::Dump($binary);
Dump $dump;
my $reload = YAML::XS::Load($dump);
Dump $reload;
say "equal" if $reload eq $binary;
__END__

SV = PV(0x55555c282fc0) at 0x55555c2a2618
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x55555c37fba0 "\342\202\254"\0
  CUR = 3
  LEN = 10
  COW_REFCNT = 1
SV = PV(0x55555c2830a0) at 0x55555c37ade0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x55555c395c20 "--- \"\303\242\\x82\302\254\"\n"\0
  CUR = 15
  LEN = 24
SV = PV(0x55555c283080) at 0x55555c2f7270
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x55555c2a2000 "\303\242\302\202\302\254"\0 [UTF8 "\x{e2}\x{82}\x{ac}"]
  CUR = 6
  LEN = 10
equal
perlpunk commented 5 years ago

I just released YAML-LibYAML-0.78.tar.gz

perlpunk commented 5 years ago

About YAML::PP: Sorry for suggesting you a library which is not yet stable. Not stable means that the API is not stable (meaning it should work well as long as ou can pin it to a specific version). For developing a good library I need usecases and users that try it out and are able to react to API changes. That's why I suggested it. If that's not the case for you, then you shouldn't use it.