postmodern / nokogiri-diff

Calculate the differences between two XML/HTML documents.
MIT License
131 stars 12 forks source link

Ignore differences in attribute order? #5

Closed bhollis closed 11 years ago

bhollis commented 11 years ago

The order of attributes on an element shouldn't result in a diff, should they?

bhollis commented 11 years ago

Never mind, something else was going on.

bhollis commented 11 years ago

Oh, actually, this actually is a problem.

Repro:

require 'nokogiri'
require 'nokogiri/diff'

x = Nokogiri::XML("<foo a='a' b='b'></foo>")
y = Nokogiri::XML("<foo b='b' a='a'></foo>")

x.diff(y) do |change, node|
  puts "#{change} #{node.inspect}\n"
end

Output:

  #<Nokogiri::XML::Element:0x3ffbdd4df5a8 name="foo" attributes=[#<Nokogiri::XML::Attr:0x3ffbdd4defcc name="a" value="a">, #<Nokogiri::XML::Attr:0x3ffbdd4defb8 name="b" value="b">]>
- #<Nokogiri::XML::Attr:0x3ffbdd4defcc name="a" value="a">
  #<Nokogiri::XML::Attr:0x3ffbdd4defb8 name="b" value="b">
+ #<Nokogiri::XML::Attr:0x3ffbdd4de810 name="a" value="a">
  #<Nokogiri::XML::Text:0x3ffbdcc89908 "b">

I can fix it by patching tdiff_each_child, but I'm not sure if it should be an option or something:

def tdiff_each_child(node,&block)
  if node.kind_of?(Nokogiri::XML::Element)
    node.attribute_nodes.sort {|a,b| a.name <=> b.name }.each(&block)
  end

  node.children.each(&block)
end
postmodern commented 11 years ago

OTOH, difference in order implies something changed in the rendering of the HTML.

bhollis commented 11 years ago

Definitely. But I think in the XML model, the order of attributes doesn't matter - just the presence and value of the attributes. I know when you canonicalize a document attributes end up sorted. That's why I don't know if this should be a general fix or an option.

leehambley commented 11 years ago

FWIW I'd be on the side of not caring about attribute order.

postmodern commented 11 years ago

OK, there is a case for ignoring the order. Now how do we accomplish this with the current tsort algorithm?

bhollis commented 11 years ago

I think the modified tdiff_each_child I posted works.

postmodern commented 11 years ago

Alright I will add your code and write some specs for this new behaviour.

bhollis commented 11 years ago

Thanks!

postmodern commented 11 years ago

Released in 0.2.0.