pantsbuild / jarjar

An export of https://code.google.com/p/jarjar/ @ svn:r142 for pants tool use and further development.
Apache License 2.0
37 stars 27 forks source link

Make JarJar handle jars with misplaced classes more reasonably. #8

Closed gmalmquist closed 9 years ago

gmalmquist commented 9 years ago

A "misplaced class" here refers to a class whose jar entry path (eg, "org/pantsbuild/Foobar.class") does not match the path implied by its fully-qualified classname.

Previously, if you had a (corrupt) jar with a directory containing misplaced classes, jarjar would move all the classes in the so that their directory properly matched their package names. This was simply an undocumented (and probably unconsidered) side-effect of how jar entries are transformed by the JarTransformer.

This would cause problems when there already were classes at those locations, and JarJar would explode due to a duplicate jar entries problem that it had constructed itself.

With this change, JarJar allows the specification of a "misnamedClassStrategy" variable, which can be "fatal" (for fail-fast exit behavior), "move" (for the previously existing behavior), or "skip" (exclude the classes from shading).

This defaults to "skip". A warning is also printed to stderr if the verbose flag is turned on.

jsirois commented 9 years ago

I don't have details on the "corrupt" jars - but my guess is these jars embed classfiles as resources at a path different from their package name? Can the <keep/> rule be used to cover the cases you care about?: https://code.google.com/p/jarjar/wiki/CommandLineDocs

gmalmquist commented 9 years ago

No existing rules are sufficient, because they match against the classname according to bytecode, not the jar entry name.

Here's an example of a bad jar: http://mvnrepository.com/artifact/com.sun.xml.bind/jaxb-xjc/2.2

gmalmquist commented 9 years ago

Actually that's a lie the rules do match against the entry name for the keep rule, but that has the unwanted side-effect of throwing out everything that isn't reachable from keep, which isn't what we want.

jsirois commented 9 years ago

Yeah - I did not read <keep/> well! Last try before actually reviewing - does a noop <rule/> that identity maps keep the badness from happening? I assume 1.0.com.sun.**.* are the ones to leave alone, and I'm guessing you tried an exclude_package(package_name='1.0.com.sun', recursive=True) and that did not work.

gmalmquist commented 9 years ago

Yeah that doesn't work because the package rename rules do just look at the bytecode.

The problem is that regardless of whether the class is shaded or not, JarTransformer sets the structs entry to its computed value after transform() is invoked, regardless of what transform() actually does.

I'm about to update with a significant refactor pertaining to Eric's comments, so you might want to wait a few minutes.

gmalmquist commented 9 years ago

Updated to use a ProcessingContext to keep track of jar entry skips and exclusions, and a factory to instantiate the various kinds of MisnamedClassProcessors.

jsirois commented 9 years ago

OK - this all looks good to me - my comments are all nits really, let me know offline when you've got things like you want them.

gmalmquist commented 9 years ago

Updated again; Eric persuaded me that it would be better to kill ProcessingContext, and deal with verbose in another PR to refactor the logging system (by actually using one).

jsirois commented 9 years ago

I'm happy with this once CI goes green. Eric - you good?

ericzundel commented 9 years ago

LGTM2

jsirois commented 9 years ago

Patched in @ 646f7f8d4b4fef7466e1a4f1246c60d90e013250

gmalmquist commented 9 years ago

Thanks john!