postmodern / nokogiri-diff

Calculate the differences between two XML/HTML documents.
MIT License
131 stars 12 forks source link

How can I compare the children of a fragment? #11

Open archonic opened 6 years ago

archonic commented 6 years ago

I so far haven't been able to get sensible output between two simple fragments. I have this for a simple comparison service:

class ComparisonService
  def initialize(seq1, seq2)
    @doc1 = Nokogiri::HTML.fragment(seq1)
    @doc2 = Nokogiri::HTML.fragment(seq2)
  end

  def raw
    {
      old: @doc1,
      new: @doc2
    }
  end

  def changes
    output = []
    @doc1.diff(@doc2) do |change, node|
      output << {
        change: change,
        node: node.to_html
      }
    end
    output
  end
end

This is the ouput of comparison.raw:

{:old=>
  #(DocumentFragment:0x2aef160d6238 {
    name = "#document-fragment",
    children = [
      #(Element:0x2aef160c5de8 { name = "p", children = [ #(Text "This paragraph remains the same.")] }),
      #(Element:0x2aef160c5d84 { name = "p", children = [ #(Text "This paragraph gets removed.")] })]
    }),
 :new=>
  #(DocumentFragment:0x2aef160c5b68 {
    name = "#document-fragment",
    children = [
      #(Element:0x2aef160c5438 { name = "p", children = [ #(Text "This paragraph remains the same.")] }),
      #(Element:0x2aef160c5348 { name = "p", children = [ #(Text "This paragraph is new.")] })]
    })}

I should see one removal and one addition for the change in the second paragraph. However, the changes method lumps everything together:

[{:change=>"-", :node=>"<p>This paragraph remains the same.</p><p>This paragraph gets removed.</p>"}, {:change=>"+", :node=>"<p>This paragraph remains the same.</p><p>This paragraph is new.</p>"}]

I've tried @doc1 = Nokogiri::HTML(seq1) but this appends <html> and <body> (unwanted) and seems to run the comparison against children recurrsively, like a russian doll:

[1] pry(#<DocumentsController>)> comp.raw
=> {:old=>
  #(Document:0x2aef1679ac90 {
    name = "document",
    children = [
      #(DTD:0x2aef1669663c { name = "html" }),
      #(Element:0x2aef16692604 {
        name = "html",
        children = [
          #(Element:0x2aef1668c31c {
            name = "body",
            children = [
              #(Element:0x2aef1667f5cc { name = "p", children = [ #(Text "This paragraph remains the same.")] }),
              #(Element:0x2aef166764f4 { name = "p", children = [ #(Text "This paragraph gets removed.")] })]
            })]
        })]
    }),
 :new=>
  #(Document:0x2aef1679abdc {
    name = "document",
    children = [
      #(DTD:0x2aef1663fcec { name = "html" }),
      #(Element:0x2aef1663e068 {
        name = "html",
        children = [
          #(Element:0x2aef16635ef4 {
            name = "body",
            children = [
              #(Element:0x2aef1662d1f0 { name = "p", children = [ #(Text "This paragraph remains the same.")] }),
              #(Element:0x2aef1661ebf0 { name = "p", children = [ #(Text "This paragraph is new.")] })]
            })]
        })]
    })}
[2] pry(#<DocumentsController>)> comp.changes
=> [{:change=>" ", :node=>""},
 {:change=>" ", :node=>"<html><body>\n<p>This paragraph remains the same.</p>\n<p>This paragraph gets removed.</p>\n</body></html>"},
 {:change=>" ", :node=>"<body>\n<p>This paragraph remains the same.</p>\n<p>This paragraph gets removed.</p>\n</body>"},
 {:change=>" ", :node=>"<p>This paragraph remains the same.</p>\n"},
 {:change=>" ", :node=>"<p>This paragraph gets removed.</p>"},
 {:change=>" ", :node=>"This paragraph remains the same."},
 {:change=>"-", :node=>"This paragraph gets removed."},
 {:change=>"+", :node=>"This paragraph is new."}]

I'm not sure if others find that output favourable, but I'm looking to make the output make sense by rendering html changes side by side, like commits can be viewed on github. Any suggestions?