Closed rafbm closed 3 years ago
Well that's annoying! I'll take a look!
The fix turned out to be quite simple, but tracking it down was harder than it should have been. Everything looked correct in the debugger before I realized that lldb wasn't printing the BOM when printing the strings.
Thanks for providing the test case! I adapted it for testing the underlying gumbo parser as well as testing in Ruby.
Thanks so much for acting on this so quickly! Are you planning on releasing a gem update? These C-level crashes are so bad… I’m nervous for anyone who would update to 2.0.3 at the moment. 😓
2.0.2 misbehaved in the presence of a UTF-8 BOM as well (although it did it silently, without any assertion failures).
The work around for both versions is quite simple.
irb(main):001:0> require 'nokogumbo'
=> true
irb(main):002:0> Nokogumbo::VERSION
=> "2.0.3"
irb(main):003:0> html = "\uFEFF<b>BOM</b>"
=> "<b>BOM</b>"
irb(main):004:0> Nokogiri::HTML5.fragment(html.delete_prefix("\uFEFF"))
=> #<Nokogiri::HTML5::DocumentFragment:0x438 name="#document-fragment" children=[#<Nokogiri::XML::Element:0x424 name="b" children=[#<Nokogiri::XML::Text:0x410 "BOM">]>]>
But yes, I'll release a new update soon. I want to take a look at another issue that arose first.
Once the CI tests pass for https://github.com/rubys/nokogumbo/pull/161, I'll release a new gem.
Given the following test file:
The last call to
.fragment
crashes: