sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.14k stars 897 forks source link

JRuby XSLT failure #1751

Open rcrews opened 6 years ago

rcrews commented 6 years ago

Using XSLT to write JSON, my program either hangs or gives an inscrutable error message. The same XML/XSLT processes fine with xsltproc and with MRI.

I created this short program to demonstrate the problem:

#!/usr/bin/env ruby -w
require 'nokogiri'

doc = Nokogiri::XML('<g>Greetings</g>')
xslt = Nokogiri::XSLT(<<-STYLESHEET
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" omit-xml-declaration="yes"/>
  </xsl:stylesheet>
  STYLESHEET
  )
puts xslt.transform(doc)

The result is

RuntimeError: org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted. 
  transform at nokogiri/XsltStylesheet.java:231
     <main> at exe/test.rb:11

When I run a similar transformation using a larger XML and XSLT, the program hangs indefinitely. For that, the XML and XSLT are attached and the the program is

#!/usr/bin/env ruby -w
require 'nokogiri'

doc = Nokogiri::HTML('toc2json.html')
xslt = Nokogiri::XSLT('toc2json.xsl')
File.open('toc2json.json', 'w') do |file|
  file.write xslt.transform(doc)
end

toc2json.zip

flavorjones commented 6 years ago

Can you please tell us the output from nokogiri -v?

rcrews commented 6 years ago

Thanks you for reviewing the report.
Here's the output of nokogiri -v:

$ nokogiri -v
# Nokogiri (1.8.2)
    ---
    warnings: []
    nokogiri: 1.8.2
    ruby:
      version: 2.3.3
      platform: java
      description: jruby 9.1.17.0 (2.3.3) 2018-04-20 d8b1ff9 Java HotSpot(TM) 64-Bit Server
        VM 25.172-b11 on 1.8.0_172-b11 +jit [darwin-x86_64]
      engine: jruby
      jruby: 9.1.17.0
    xerces: Xerces-J 2.11.0
    nekohtml: NekoHTML 1.9.21
rcrews commented 6 years ago

Thanks to @ar7max for pointing out my last example had a syntax error. Here is the version that actually shows the error:

#!/usr/bin/env ruby -w
require 'nokogiri'

doc = Nokogiri::HTML( File.open('toc2json.html') )
xslt = Nokogiri::XSLT( File.open('toc2json.xsl') )
puts xslt.transform(doc)

Here is the output from JRuby 9.1.17.0 with Nokogiri 1.8.2 as -v'd above:

$ ./toc2json.rb 
Unhandled Java exception: java.lang.NullPointerException
java.lang.NullPointerException: null
  retryXsltTransformation at nokogiri/XsltStylesheet.java:263
                transform at nokogiri/XsltStylesheet.java:218
                     call at nokogiri/XsltStylesheet$INVOKER$i$0$2$transform.gen:-1
                     call at org/jruby/internal/runtime/methods/JavaMethod.java:796
                     call at org/jruby/internal/runtime/methods/DynamicMethod.java:202
             cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:318
                     call at org/jruby/runtime/callsite/CachingCallSite.java:155
  invokeOther10:transform at $_dot_/./toc2json.rb:16
                   <main> at ./toc2json.rb:16
      invokeWithArguments at java/lang/invoke/MethodHandle.java:627
                     load at org/jruby/ir/Compiler.java:94
                runScript at org/jruby/Ruby.java:830
              runNormally at org/jruby/Ruby.java:749
              runNormally at org/jruby/Ruby.java:767
              runFromMain at org/jruby/Ruby.java:580
            doRunFromMain at org/jruby/Main.java:417
              internalRun at org/jruby/Main.java:305
                      run at org/jruby/Main.java:232
                     main at org/jruby/Main.java:204

Running the same script with MRI (Ruby 2.5.1, nokogiri 1.8.2) returns JSON, but wrong JSON with an XML declaration as described in Issue #1750.

The files are attached in i1751.zip

rcrews commented 6 years ago

I know Java Nokogiri uses NekoHTML and Xerces. I'm currently working around this reported Nokogiri issue by interacting directly with those same Java libraries. I like and prefer the Nokogiri API, but for now, this is my work-around. For anyone interested, a stripped-down version of the work-around is attached at i1751_workaround.zip

Would love to resolve the issue and contribute a pull request, but currently don't know enough about wrapping Java libraries with JRuby to do so.

flavorjones commented 6 years ago

I commented on this in #1750, and I can't imagine the two issues aren't related.

flavorjones commented 3 months ago

Running latest Nokogiri (v1.16.6), the error I'm seeing is:

#! /usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", "~> 1.16.0"
end

pp Nokogiri::VERSION_INFO

doc = Nokogiri::HTML(File.read(File.join(__dir__, 'toc2json', 'toc2json.html')))
xslt = Nokogiri::XSLT(File.read(File.join(__dir__, 'toc2json', 'toc2json.xsl')))
puts xslt.serialize(xslt.transform(doc))
RuntimeError: org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted.
  transform at nokogiri/XsltStylesheet.java:218
     <main> at ./issues/1751-jruby-xslt/1751-jruby-xslt.rb:14