perlpunk / YAML-PP-p5

A YAML 1.2 processor in perl
https://metacpan.org/pod/YAML::PP
24 stars 8 forks source link

Change of behavior for undef keys #21

Closed pplu closed 4 years ago

pplu commented 5 years ago

Hi,

I've found a change in behavior in the handling of serialization and deserialization of some YAML structures between YAML::PP versions.

In current YAML::PP, a document like this:

key:

is getting deserialized to

{ key => '' }

When we serialize it again, it's going to what you would expect from the serialization of that Perl datastructure:

---
key: ''

The resulting YAML, IMHO, is not "equivalent" to the original one. We've detected this because one of our test suites started failing when building with up-to-date dependencies. We've tracked the change down to YAML::PP 0.012, that doesn't display this behavior, instead deserializing the document to:

{ key => undef }

which I think is more expected, since then roundtripping the YAML brings you to:

---
key: null

Which seems like a better behavior for unspecified values for keys, since they will return to the same Perl datastructure.

Do you think it's worth returning YAML::PP to the 0.012 behavior? Or was it a design decision? Is there some type of way to control the serialization that I'm not aware of?

Thanks in advance, and always open to help solve this "bug"

BTW: here's a small test script that passes on 0.012, but fails on later YAML::PPs

#!/usr/bin/env perl

use strict;
use warnings;

use Test::More;
use YAML::PP qw/Load Dump/;

{
  my $yaml = "key:";

  my $perl = Load($yaml);
  ok(exists($perl->{ key }), 'key exists');
  is($perl->{ key }, undef, 'and is an undef');

  like(Dump($perl), qr/key: null/);

  my $roundtrip = Load(Dump($perl));
  ok(exists($roundtrip->{ key }), 'key exists');
  is($roundtrip->{ key }, undef, 'and is an undef');
}

done_testing;
perlpunk commented 5 years ago

This was a deliberate change, however, I also think it's not the best behaviour and I thought already about changing it. The background is a bit more complicated. YAML::PP uses the YAML 1.2 JSON Schema as default (instead of the Core Schema). https://yaml.org/spec/1.2/spec.html#id2803231

The YAML 1.2 JSON Schema, however, is actually a strict schema which does not allow any unquoted scalars expect the special types. (For completeness, I want to implement a JSONStrict Schema at some point). I am using a modified variant of that schema that detects values like true|false|null|123|3.14 as special types, but everything else as a string, even if unquoted. The problem with that is, that there is no clear definition about what to do with the empty scalar - should it be the empty string (as per my definition above that everything else is a string) or null. In the strict schema it would be disallowed.

I realized that using null here has advantages in several ways, so I'll change it back.

edit: If you want to be compatible with official YAML 1.2 (other processors supporting 1.2 often only support the Core Schema) then you should enable the Core Schema. But note that this has a lot more special values: https://yaml.org/spec/1.2/spec.html#id2804923 None of the schemas is actually compatible to what YAML::XS is doing, for example.

edit2: The actual change in 0.013 was:

Change default schema from Core to JSON

https://metacpan.org/changes/distribution/YAML-PP

pplu commented 5 years ago

I'm OK with changing our application to use the Core schema (my initial test script is already working with the schema change). I wasn't conscious that the schema change had implications in the treatment of null / undef values while I was tracking down the reason for our test failures.

As to what YAML::PP should do by default: you have a better overview of everything. Our use case is covered by configuring the schema, so we're happy with that.

Attached is new test script working with current 0.018

#!/usr/bin/env perl

use strict;
use warnings;

use Test::More;
use YAML::PP;

my $pp = YAML::PP->new(schema => ['Core']);

{
  my $yaml = "key:";

  my $perl = $pp->load_string($yaml);
  ok(exists($perl->{ key }), 'key exists');
  is($perl->{ key }, undef, 'and is an undef');

  like($pp->dump_string($perl), qr/key: null/);

  my $roundtrip = $pp->load_string($pp->dump_string($perl));
  ok(exists($roundtrip->{ key }), 'key exists');
  is($roundtrip->{ key }, undef, 'and is an undef');
}

done_testing;
perlpunk commented 5 years ago

As to what YAML::PP should do by default: you have a better overview of everything

At some point I would like to create a table that shows an overview of which schema treats which values special, so others will also have a better understanding =)

perlpunk commented 4 years ago

Finally I got around to creating the table: https://perlpunk.github.io/YAML-PP-p5/schema-examples.html And here are the definitions: https://perlpunk.github.io/YAML-PP-p5/schemas.html

I also changed back to Core as default.