shlomif / perl-XML-SemanticDiff

The XML-SemanticDiff CPAN distribution for semantic (= non-ordered and probably not what you want) comparison of two XML docs
https://metacpan.org/release/XML-SemanticDiff
Other
2 stars 5 forks source link

Fix for bug https://rt.cpan.org/Public/Bug/Display.html?id=84546 #4

Closed apinco closed 6 years ago

apinco commented 6 years ago

Hi Shlomi, There is an old outstanding bug where this code treats the following as the same:

<element>0</element>
and
<element></element>

I believe this behavior is incorrect and the code should report those two as different. I have updated the code so that it will identify this difference.

In version 1.005 (before the changes) I see the following in the debugger:

DB<16> use XML::SemanticDiff
DB<17> $fos_xml = '<root><element>0</element></root>'      
DB<18> $rest_xml = '<root><element></element></root>'
DB<19> $diff = XML::SemanticDiff->new(keepdata => 1, keeplinenums => 1); 
DB<20> x @changes = $diff->compare($rest_xml, $fos_xml)
  empty array

In version 1.006 (with changes in this pull request) I see the following in the debugger:

DB<21> use XML::SemanticDiff
DB<22> $fos_xml = '<root><element>0</element></root>'
DB<23> $rest_xml = '<root><element></element></root>'
DB<24> $diff = XML::SemanticDiff->new(keepdata => 1, keeplinenums => 1);
DB<25> x @changes = $diff->compare($rest_xml, $fos_xml)
0  HASH(0x41eb8f0)
'context' => '/root[1]/element[1]'
'endline' => 1
'message' => 'Character differences in element \'element\'.'
'new_value' => 0
'old_value' => undef
'startline' => 1

Let me know if you have any questions on the changes I am proposing.

Thanks

shlomif commented 6 years ago

Hi @apinco , your fix looks mostly fine (but I would make use of the ternary operator to avoid duplicate code) but it lacks regression tests (under t/*.t). Please add some.

apinco commented 6 years ago

Hi @shlomif I have changed the code to use a ternary operator. Here are the results of my unit test BEFORE my changes:

perl 16zero_to_empty_str_cmp.t
1..18
ok 1 - Difference found between 0 and empty string in attr
ok 2 - check old value 0
ok 3 - check new value empty_string
ok 4 - Difference found between 0 and missing attr
ok 5 - check old value 0
ok 6 - check new value undef
not ok 7 - Difference found between 0 and empty string in elem(<el></el>)
#   Failed test 'Difference found between 0 and empty string in elem(<el></el>)'
#   at 16zero_to_empty_str_cmp.t line 80.
#          got: '0'
#     expected: '1'
not ok 8 - check old value 0
#   Failed test 'check old value 0'
#   at 16zero_to_empty_str_cmp.t line 85.
#          got: undef
#     expected: '0'
ok 9 - check new value undef
not ok 10 - Difference found between 0 and empty string in elem(<el />)
#   Failed test 'Difference found between 0 and empty string in elem(<el />)'
#   at 16zero_to_empty_str_cmp.t line 92.
#          got: '0'
#     expected: '1'
not ok 11 - check old value 0
#   Failed test 'check old value 0'
#   at 16zero_to_empty_str_cmp.t line 97.
#          got: undef
#     expected: '0'
ok 12 - check new value undef
ok 13 - Identical XMLs with attrs 0 generate identical results
ok 14 - Identical XMLs with attrs empty_string generate identical results
ok 15 - Identical XMLs with attrs missing generate identical results
ok 16 - Identical XMLs with elem 0 generate identical results
ok 17 - Identical XMLs with elem empty_string generate identical results
ok 18 - Identical XMLs with elem undef generate identical results
# Looks like you failed 4 tests of 18.

Here are the results AFTER my changes:

perl 16zero_to_empty_str_cmp.t
1..18
ok 1 - Difference found between 0 and empty string in attr
ok 2 - check old value 0
ok 3 - check new value empty_string
ok 4 - Difference found between 0 and missing attr
ok 5 - check old value 0
ok 6 - check new value undef
ok 7 - Difference found between 0 and empty string in elem(<el></el>)
ok 8 - check old value 0
ok 9 - check new value undef
ok 10 - Difference found between 0 and empty string in elem(<el />)
ok 11 - check old value 0
ok 12 - check new value undef
ok 13 - Identical XMLs with attrs 0 generate identical results
ok 14 - Identical XMLs with attrs empty_string generate identical results
ok 15 - Identical XMLs with attrs missing generate identical results
ok 16 - Identical XMLs with elem 0 generate identical results
ok 17 - Identical XMLs with elem empty_string generate identical results
ok 18 - Identical XMLs with elem undef generate identical results

Let me know if there are any other changes necessary.

Thanks

shlomif commented 6 years ago

Thanks @apinco ! Looks mostly fine. I'll take a look soon and try to merge your changes.

apinco commented 6 years ago

Thanks so much! I am new to the process of updating a module on cpan and so I don't know the time frame for this code to be updated upstream. Do you have an estimate on how long the update takes to happen? I am asking as I have to make a decision to either wait for cpan to pick up the change, or roll out a local copy of the package to all of my servers that overrides the default cpan module.

Thanks again for your time.

shlomif commented 6 years ago

Thanks @apinco ! I merged your PR , did some fixups (trailing space, etc.) and uploaded it to CPAN: https://metacpan.org/recent . It may take some time to propogate across all mirrors.

shlomif commented 6 years ago

Thanks @apinco ! I merged your PR , did some fixups (trailing space, etc.) and uploaded it to CPAN: https://metacpan.org/recent . It may take some time to propogate across all mirrors.