rubys / nokogumbo

A Nokogiri interface to the Gumbo HTML5 parser.
Apache License 2.0
186 stars 114 forks source link

Fix regression when nil passed to fragment #158

Closed lis2 closed 3 years ago

lis2 commented 3 years ago

There is a small regression in v2.0.3

# nokogumbo 2.0.2
Nokogiri::HTML5::fragment(nil)
# => #<Nokogiri::HTML5::DocumentFragment:0x3fda6e849f80 name="#document-fragment">

# nokogumbo 2.0.3
Nokogiri::HTML5::fragment(nil)
# => NoMethodError (undefined method `to_str' for nil:NilClass)

I think that document and fragment should behave consistently and both should accept nil as a param.

This line is causing error https://github.com/rubys/nokogumbo/blob/master/lib/nokogumbo/html5.rb#L99

to_s is a safer choice. In addition, I run a small benchmark to ensure it will not cause a significant performance problem

require 'benchmark'

"test performance".to_s
"test performance".to_str

Benchmark.bm do |benchmark|
  benchmark.report("to_s") do
    5_000_000.times do
      "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean accumsan purus sed maximus fermentum. Nunc mattis auctor libero varius facilisis. Maecenas ac ornare ipsum. Maecenas nisi dolor, congue eu elementum at, eleifend a enim. Duis sit amet iaculis lacus, a dapibus lorem. Suspendisse libero purus, placerat et mi eget, luctus consequat libero. In rutrum rhoncus massa at semper. Praesent vitae mauris hendrerit, tempus massa sit amet, pellentesque tortor. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Integer sem urna, pretium sed felis at, facilisis volutpat nisl. Integer interdum consectetur rhoncus. Aenean nunc nulla, mattis et felis ut, egestas auctor lectus.".to_s
    end
  end

  benchmark.report("to_str") do
    5_000_000.times do
      "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean accumsan purus sed maximus fermentum. Nunc mattis auctor libero varius facilisis. Maecenas ac ornare ipsum. Maecenas nisi dolor, congue eu elementum at, eleifend a enim. Duis sit amet iaculis lacus, a dapibus lorem. Suspendisse libero purus, placerat et mi eget, luctus consequat libero. In rutrum rhoncus massa at semper. Praesent vitae mauris hendrerit, tempus massa sit amet, pellentesque tortor. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Integer sem urna, pretium sed felis at, facilisis volutpat nisl. Integer interdum consectetur rhoncus. Aenean nunc nulla, mattis et felis ut, egestas auctor lectus.".to_str
    end
  end
end
       user     system      total        real
to_s  0.243748   0.010163   0.253911 (  0.253960)
to_str  0.254399   0.000004   0.254403 (  0.254407)
stevecheckoway commented 3 years ago

You're right. to_s is the correct choice here. Thanks for the fix and the test case!