Closed mkristian closed 2 years ago
Hi there,
Sorry again, but you'll need to fully explain what the problem you're trying to solve is. Not being a full-time Java developer, I have no idea what jar-dependencies
is, what problem it is supposed to help, what the tradeoffs are to supporting it, nor what alternative solutions were considered.
Let's please just start with a clear description of the problem? We can then proceed to a discussion of solutions. Thanks for your patience.
in jruby you can use gems and jars almost in the same manner as they provide libraries your code can use.
gems are managed by rubygems/bundler and they usually declare their gem-dependencies inside their gemspec - to vendor gems is not that common (any more).
in the java world jar can have dependenies and they can be declared as well
BUT gems like nokogiri vendor their jars, i.e. they get loaded at some point in time into the jruby-classloader. there is no way to find out which jars with which version was loaded since in java the jars do not contain such meta information (those tools above provide thise meta data on side).
let say I have a java library depending on xerces or xalan and my jruby project is using it along nokogiri. tools like jbundler or maven or gradle can handle the java libraries for your project but they can NOT inlcude those vendored jars of nokogiri into account. so you end up with loading the same jar TWICE into the jruby classloader - maybe even with different versions !
so jar-dependencies first gives a way to "declare" the jar inside the gemspec in a similar manner as you would declare gem dependencies: https://github.com/jruby/jruby-openssl/blob/master/jruby-openssl.gemspec#L23
now you still can vendor those jars BUT load them through jar-dependencies and this will warn you on version conflict on those jars or not load the jar a second time.
declaring the jar inside the gemspec also gives maven-like tools a way to manage those jars as well: http://repo1.maven.org/maven2/rubygems/jruby-openssl/0.9.5/jruby-openssl-0.9.5.pom (dependendies tag) which is derived from the respective gemspec file.
on top jbundler uses the maven meta-info of gem to find their jar dependencies.
this is all about getting gems and jars on the same level for jruby and help projects which life as much inside the java world as inside the ruby world - jruby being the bridge between them.
one last thing: with https://github.com/lookout/leafy we do not vendor the jar anymore (jar dependencies hooks into rubygems and installs them on) but this needs jruby-1.7.13+
for jruby-openssl we do support older jrubies so we vendor those jars for those old jrubies.
@jvshahid will continue my thingy from https://github.com/sparklemotion/nokogiri/issues/1395#issuecomment-164052036 here
as usual there is a ruby side of things and a java side of things
having those jar dependencies declared inside nokogiri gemspec allows tools like jbundler or jar-dependencies to manage jars in similar manner as gems. both tools can use Jarfile to setup jars for a jruby project - see https://gist.github.com/mkristian/a46851705427f68ce310. they do obey the jars declared inside a gemspec when used with bundler's Gemfile or can lockdown you jar versions for a gem project which has nokogiri as transitive dependency.
loading jars via jar-dependencies gems also ensures that each jar is loaded only with a single version and any attempt to load the jar with another version will produce only a warning without loading the jar. just an extra safety net.
both maven and gradle have plugins which handle gems and jars in the same manner, gem and jar are just different kind of dependencies with a different packaging forma and are treated alike otherwise. but they depend on having jars declared inside the gemspec to be able to integrate them into their dependency resolution. for embedded jruby this is even more important then for a ruby project. a java application which uses embedded jruby does come with its own set of dependencies. and those dependencies from the java application are already part of the jruby runtime - as per design of jruby classloaders. here it would be important to tell jar-dependencies which jars are already loaded by the underlying container.
of course forcing a different version of xerces or using xercesMinimal instead of xerces can cause nokogiri to produce bugs as it uses a different version then the one from nokogiri itself. on the other hand this is common situation with java projects.
require_jar(group_id, artifact_id, version)
this is the manual way of doing it. look at https://github.com/mkristian/jar-dependencies/tree/master/examples/gem-with-java-extension-and-jar-dependencies/using-rake-compiler for another way of doing it.
the whole thing comes with another gem dependency but jruby-1.7.x and jruby-9k does have this extra gem already as default gem.
the only impact for the average nokogiri user is that jar-dependencies gem does get activated and used as well. once the jars are loaded there is no further impact on the nokogiri
@headius what do you think about this proposal ? I personally like it but I'm worried about wide adoption and edge cases that could arise from using jbundler and jar-dependencies. your thoughts will be very helpfull.
/cc @yokolet
@jvshahid FYI all jars which are embedded in jruby and on maven-central are loaded via jar-dependencies: jline-2.11.jar, the 2 bouncy-castle jars from jruby-openssl, and snakeyaml-1.14 for psych gem
any decision ?
I picked this up with @jvshahid this morning. Won't make the v1.9.0 release this week; but hoping to adopt this approach for the following release.
I started looking into this and I think the main challenge is figuring out which maven coordinates to use for all the libraries.
I started trying to sort it out but some of the version information is lost to time. I am using https://search.maven.org to look for artifacts.
Without at least getting the @jvshahid forked artifacts pushed to Maven Central, I'm not sure this can move forward (unless we only jar-deps the ones we can figure out?)
The following patch appears to produce a gem that installs all the jar dependencies. Note that these versions may or may not be right based on my previous comment (and the @jvshahid versions are not in here at all):
diff --git a/Rakefile b/Rakefile
index b81d6fde..e23374ef 100644
--- a/Rakefile
+++ b/Rakefile
@@ -133,7 +133,23 @@ HOE = Hoe.spec 'nokogiri' do
]
self.clean_globs += Dir.glob("ports/*").reject { |d| d =~ %r{/archives$} }
- unless java?
+ if java?
+ self.extra_deps += [
+ ['jar-dependencies', "~> 0.4.0"],
+ ]
+ def self.add_dependencies
+ super
+
+ spec.requirements << 'jar com.sun.xml.bind.jaxb, isorelax, 20090621' # unknown where to find original 20041111
+ spec.requirements << 'jar com.thaiopensource, jing, 20091111'
+ spec.requirements << 'jar nekohtml, nekodtd, 0.1.11' # FIXME, not using jvshahid's fork!
+ spec.requirements << 'jar nekohtml, nekohtml, 1.9.6.2' # FIXME, not using jvshahid's fork!
+ spec.requirements << 'jar xalan, serializer, 2.7.2'
+ spec.requirements << 'jar xalan, xalan, 2.7.2'
+ spec.requirements << 'jar xerces, xercesImpl, 2.12.0'
+ spec.requirements << 'jar xml-apis, xml-apis, 1.4.01'
+ end
+ else
self.extra_deps += [
["mini_portile2", "~> 2.4.0"], # keep version in sync with extconf.rb
]
I also had to make the following change for compatibility with JRuby 9.2.9, which added ThreadContext
to the RubyBasicObject
implementation of to_a
. Without that, this will not compile (because it tries to specify a more general return type):
diff --git a/ext/java/nokogiri/XmlNodeSet.java b/ext/java/nokogiri/XmlNodeSet.java
index ea8b031a..931ab3d7 100644
--- a/ext/java/nokogiri/XmlNodeSet.java
+++ b/ext/java/nokogiri/XmlNodeSet.java
@@ -39,6 +39,7 @@ import static nokogiri.internals.NokogiriHelpers.nodeListToRubyArray;
import java.util.Arrays;
import org.jruby.Ruby;
+import org.jruby.RubyArray;
import org.jruby.RubyClass;
import org.jruby.RubyFixnum;
import org.jruby.RubyObject;
@@ -392,7 +393,7 @@ outer:
}
@JRubyMethod(name = {"to_a", "to_ary"})
- public IRubyObject to_a(ThreadContext context) {
+ public RubyArray to_a(ThreadContext context) {
return context.runtime.newArrayNoCopy(nodes);
}
I will discuss with @enebo if there's any compatibility issue here. I'm unsure if this affects runtime; the JVM tends not to care about such typing issues. The original method should probably return IRubyObject
though, since it's possible to get a nil
from some classes.
These changes are least get the dependencies into the jar, and subsequently into the installed gem:
$ ls ../jruby/lib/ruby/gems/shared/gems/nokogiri-1.10.4-java/lib/
com isorelax.jar nekodtd.jar nekohtml.jar nokogiri nokogiri_jars.rb xalan xerces xml-apis xsd
isorelax jing.jar nekohtml net nokogiri.rb serializer.jar xalan.jar xercesImpl.jar xml-apis.jar
The non-jar elements here are directories based on the maven coordinates of each library. The original jars are still in place because I'm not sure how to use jar-dependencies at dev time. Help, @mkristian?
The patch to correct this return type problem on JRuby is here: https://gist.github.com/headius/511632d8f3b6aeb2348462b42556b410
However I would prefer to keep the more specific type and modify Nokogiri to return the same type.
Note that this was not really an incompatible API change in JRuby...Nokogiri just happened to already override with a different return type a method we added in 9.2.9.
@headius great to revive this PR. I have to see what needs. but using the jars we can from maven central is already a start. the modified once we can require manually.
@mkristian I could push the jar-deps stuff to a branch to get this going. We can leave out the jars that we don't have proper versions for. Next step would be getting dev-time working with the jar-deps dependencies.
@headius a branch would be great
@mkristian PR is at #1967.
- nekodtd.jar and nekohtml.jar: No version information in the jar, but there are pushed artifacts under the nekohtml organization. However both jars have been updated with forked versions from @jvshahid that do not appear to have ever been pushed to Maven Central.
Any ideas about what I should use for the groupId? If we decide to publish those under a new groupId, e.g., org.nokogiri, I believe we will have to maintain some Maven credentials and a signing key. @flavorjones Are you ok with maintaining an extra set of creds? I am guessing you already use some credentials to publish to rubygems.
@jvshahid If you haven't seen it already, here's the guide I use for all my Maven projects: https://central.sonatype.org/pages/ossrh-guide.html
Once you've got it bootstrapped it's not too bad to maintain. Your local PGP/GPG key should be sufficient for signing...it's really just to make sure that you're the one who pushed that resource.
As for groupId, the Sonatype "rules" really want you to use a domain name. Since these are your changes to those libraries, I'd recommend you use your own domain name, e.g. com.jvshahid:nekohtml.
I have not looked into the neko projects to see if they're still maintained; if they are, it would be best to get your changes upstreamed.
Cool, I will look into it this weekend.
this gives back the control of which jars are loaded when. you can use jbundler or maven or gradle to manage the jars. actually you manage gems and jars almost in the same manner.
it also gives you a chance to work around certain classloader problems when you really need to not delegate the classloading to the parent classloader.
currently I am working on pysch to do the same https://github.com/mkristian/psych/blob/jruby-build/Rakefile#L41 since we want have psych as default gem in jruby-9k
jruby-openssl is an example to use jar-dependencies with still packing the jars but use jar-dependencies when available (so older jrubies work as well) https://github.com/jruby/jruby-openssl/blob/master/jruby-openssl.gemspec#L23 https://github.com/jruby/jruby-openssl/blob/master/lib/jopenssl/load.rb#L8