Open rcrews opened 6 years ago
Can you please tell us the output from nokogiri -v
?
Thanks you for reviewing the report.
Here's the output of nokogiri -v
:
$ nokogiri -v
# Nokogiri (1.8.2)
---
warnings: []
nokogiri: 1.8.2
ruby:
version: 2.3.3
platform: java
description: jruby 9.1.17.0 (2.3.3) 2018-04-20 d8b1ff9 Java HotSpot(TM) 64-Bit Server
VM 25.172-b11 on 1.8.0_172-b11 +jit [darwin-x86_64]
engine: jruby
jruby: 9.1.17.0
xerces: Xerces-J 2.11.0
nekohtml: NekoHTML 1.9.21
Thanks to @ar7max for pointing out my last example had a syntax error. Here is the version that actually shows the error:
#!/usr/bin/env ruby -w
require 'nokogiri'
doc = Nokogiri::HTML( File.open('toc2json.html') )
xslt = Nokogiri::XSLT( File.open('toc2json.xsl') )
puts xslt.transform(doc)
Here is the output from JRuby 9.1.17.0 with Nokogiri 1.8.2 as -v'd above:
$ ./toc2json.rb
Unhandled Java exception: java.lang.NullPointerException
java.lang.NullPointerException: null
retryXsltTransformation at nokogiri/XsltStylesheet.java:263
transform at nokogiri/XsltStylesheet.java:218
call at nokogiri/XsltStylesheet$INVOKER$i$0$2$transform.gen:-1
call at org/jruby/internal/runtime/methods/JavaMethod.java:796
call at org/jruby/internal/runtime/methods/DynamicMethod.java:202
cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:318
call at org/jruby/runtime/callsite/CachingCallSite.java:155
invokeOther10:transform at $_dot_/./toc2json.rb:16
<main> at ./toc2json.rb:16
invokeWithArguments at java/lang/invoke/MethodHandle.java:627
load at org/jruby/ir/Compiler.java:94
runScript at org/jruby/Ruby.java:830
runNormally at org/jruby/Ruby.java:749
runNormally at org/jruby/Ruby.java:767
runFromMain at org/jruby/Ruby.java:580
doRunFromMain at org/jruby/Main.java:417
internalRun at org/jruby/Main.java:305
run at org/jruby/Main.java:232
main at org/jruby/Main.java:204
Running the same script with MRI (Ruby 2.5.1, nokogiri 1.8.2) returns JSON, but wrong JSON with an XML declaration as described in Issue #1750.
The files are attached in i1751.zip
I know Java Nokogiri uses NekoHTML and Xerces. I'm currently working around this reported Nokogiri issue by interacting directly with those same Java libraries. I like and prefer the Nokogiri API, but for now, this is my work-around. For anyone interested, a stripped-down version of the work-around is attached at i1751_workaround.zip
Would love to resolve the issue and contribute a pull request, but currently don't know enough about wrapping Java libraries with JRuby to do so.
I commented on this in #1750, and I can't imagine the two issues aren't related.
Running latest Nokogiri (v1.16.6), the error I'm seeing is:
#! /usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri", "~> 1.16.0"
end
pp Nokogiri::VERSION_INFO
doc = Nokogiri::HTML(File.read(File.join(__dir__, 'toc2json', 'toc2json.html')))
xslt = Nokogiri::XSLT(File.read(File.join(__dir__, 'toc2json', 'toc2json.xsl')))
puts xslt.serialize(xslt.transform(doc))
RuntimeError: org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted.
transform at nokogiri/XsltStylesheet.java:218
<main> at ./issues/1751-jruby-xslt/1751-jruby-xslt.rb:14
Using XSLT to write JSON, my program either hangs or gives an inscrutable error message. The same XML/XSLT processes fine with xsltproc and with MRI.
I created this short program to demonstrate the problem:
The result is
When I run a similar transformation using a larger XML and XSLT, the program hangs indefinitely. For that, the XML and XSLT are attached and the the program is
toc2json.zip