Closed pmorch closed 5 years ago
Thanks! Reproduced and fixed.
Here are 4 minimal test cases that show the problem:
"~\0\200",
"null\0\200",
"true\0\200",
"false\0\200",
In dump_scalar()
, there are some strEQ()
calls:
strEQ(string, "~") ||
strEQ(string, "true") ||
strEQ(string, "false") ||
strEQ(string, "null") ||
and strEQ stops at null bytes. I'm still not sure what exactly is going wrong on libyaml side so that the error can't be catched, but I fixed the above code:
(string_len == 1 && strEQ(string, "~")) ||
(string_len == 4 && strEQ(string, "true")) ||
(string_len == 5 && strEQ(string, "false")) ||
(string_len == 4 && strEQ(string, "null")) ||
I will release a developer version soon.
I changed the title, because the problem here is binary data.
YAML does not support binary data. All data must be unicode. So even if this bug is fixed, don't expect to get the correct data again if you reload it after a dump.
I released YAML-LibYAML-0.77_001.tar.gz
If you want to correctly handle binary data, you might want to look at YAML::PP, especially https://metacpan.org/pod/YAML::PP::Schema::Binary
Of course that's slower than YAML::XS because it's pureperl.
If you want to get part of the speed back, try replacing YAML::PP with YAML::PP::LibYAML (which uses YAML::LibYAML::API as a backend). Note that you need to have the yaml-dev library installed currently for that to work. I will include that in a future release of YAML::LibYAML::API.
For the record: The bug was introduced in version 0.59 872d0182b4142ccba0e7f09be04db4d9697cc24f
edit: or actually, it made the bug active. The wrong handling with strEQ existed before already.
Thank you so much for your quick patch, @perlpunk!
You write:
YAML does not support binary data. All data must be unicode. So even if this bug is fixed, don't expect to get the correct data again if you reload it after a dump.
However my testing shows that with with LibYAML-0.77_001
at least, there are no more problems with binary data (see below for test case).
Can you imagine a test case where ! is_deeply(Load(Dump($value)), $value)
using YAML::XS
?
We really like YAML::XS
for our general-purpose serialization format, and we'd loathe to have to have to use e.g. base64 encoding of Storable's freeze/thaw, as it makes debugging more difficult, but need to ensure we get the original (non-XS) data after a Dump/Load (or freeze/thaw or encode_json/decode_json or whatever) cycle.
You recommend YAML::PP, and I just briefly checked it out, but didn't like the warning:
WARNING: This is not yet stable.
I know this is off-topic but can you recommend a better serialization format that:
Binary data YAML::XS
test case
This test passes:
#!/usr/bin/perl
use warnings;
use strict;
use YAML::XS;
use Test::More;
sub generateRandomBinaryString {
return join('', map { chr(int(rand(256))) } 1..1000);
}
for my $iteration (1..1e6) {
my $value = { key => generateRandomBinaryString()};
is_deeply(
Load(Dump($value)), $value,
'Load(Dump) is transparent: ' . $iteration
);
}
done_testing();
@pmorch thanks! Actually you might be right. Apparently YAML::XS upgrades binary strings to utf8, and then when loading you will get back a utf8-decoded string, and if you compare this to your original binary string, it will downgrade automatically for the comparison. You should probably note that you get back a utf8-decoded string, but if you know when to handle this, then everything should be alright. I was unsure if there is any possibility for this approach to go wrong.
use Devel::Peek;
use YAML::XS ();
my $binary = "\342\202\254";
Dump $binary;
my $dump = YAML::XS::Dump($binary);
Dump $dump;
my $reload = YAML::XS::Load($dump);
Dump $reload;
say "equal" if $reload eq $binary;
__END__
SV = PV(0x55555c282fc0) at 0x55555c2a2618
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x55555c37fba0 "\342\202\254"\0
CUR = 3
LEN = 10
COW_REFCNT = 1
SV = PV(0x55555c2830a0) at 0x55555c37ade0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x55555c395c20 "--- \"\303\242\\x82\302\254\"\n"\0
CUR = 15
LEN = 24
SV = PV(0x55555c283080) at 0x55555c2f7270
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x55555c2a2000 "\303\242\302\202\302\254"\0 [UTF8 "\x{e2}\x{82}\x{ac}"]
CUR = 6
LEN = 10
equal
I just released YAML-LibYAML-0.78.tar.gz
About YAML::PP: Sorry for suggesting you a library which is not yet stable. Not stable means that the API is not stable (meaning it should work well as long as ou can pin it to a specific version). For developing a good library I need usecases and users that try it out and are able to react to API changes. That's why I suggested it. If that's not the case for you, then you shouldn't use it.
SNMP.pm returns binary data which I try to
storeToFile(Dump($binary))
.I can expand that binary data to a more full SNMP example but this is a minimal test case that causes perl to dump with a core file (if enabled) because of
YAML::XS
:When run:
Whatever data is given to a perl module, it should not cause an un-eval-able core dump.