newpolaris / jarjar

Automatically exported from code.google.com/p/jarjar
0 stars 0 forks source link

PackageRemapper uses Java 1.5 only character category #13

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run under Java 1.4 and cause PackageRemapper to be loaded through use of
the jarjar Ant task.

What is the expected output? What do you see instead?
Expected: rewritten classes
Seen:
ava.util.regex.PatternSyntaxException: Unknown character category
{javaJavaIdentifierPart} near index 29
\[L[\p{javaJavaIdentifierPart}\.]+?;
                             ^
    at java.util.regex.Pattern.error(Pattern.java:1541)
    at java.util.regex.Pattern.familyError(Pattern.java:2222)
    at java.util.regex.Pattern.retrieveCategoryNode(Pattern.java:2213)
    at java.util.regex.Pattern.family(Pattern.java:2185)
    at java.util.regex.Pattern.range(Pattern.java:2106)
    at java.util.regex.Pattern.clazz(Pattern.java:2069)
    at java.util.regex.Pattern.sequence(Pattern.java:1598)
    at java.util.regex.Pattern.expr(Pattern.java:1558)
    at java.util.regex.Pattern.compile(Pattern.java:1291)
    at java.util.regex.Pattern.<init>(Pattern.java:1047)
    at java.util.regex.Pattern.compile(Pattern.java:785)
    at com.tonicsystems.jarjar.PackageRemapper.<clinit>(PackageRemapper.java:30)

What version of the product are you using? On what operating system?
1.0RC7, OSX 10.5, JDK 1.4

Please provide any additional information below.
the javaJavaIdentifierPart character category is available from Java 5
onwards, it seems.

Original issue reported on code.google.com by mstud...@gmail.com on 3 Apr 2008 at 12:38

GoogleCodeExporter commented 9 years ago
The output that I see is different. Instead, I see 
"ExceptionInInitializerError".
Upon hacking the source and implanting appropriate debugging, though, I see 
that the
underlying cause is indeed the {javaJavaIdentifierPart} category.

It should be possible to recreate the category manually without having to rely 
on the
JDK 5+ specific category.

Original comment by robert.n...@gmail.com on 28 Apr 2008 at 9:17

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I see from SVN that line 30 of com.tonicsystems.jarjar.PackageRemapper is

    private static final Pattern ARRAY_FOR_NAME_PATTERN
        = Pattern.compile("\\[L[\\p{javaJavaIdentifierPart}\\.]+?;");

wouldn't the following accomplish the same?

    private static final Pattern ARRAY_FOR_NAME_PATTERN
        = Pattern.compile("\\[L[[A-Z,a-z,0-9,\\$,\\_]\\.]+?;");

This might be overly simplistic.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html#isJavaIdentifie
rPart(char)
states:

 A character may be part of a Java identifier if any of the following are true:

    * it is a letter
    * it is a currency symbol (such as '$')
    * it is a connecting punctuation character (such as '_')
    * it is a digit
    * it is a numeric letter (such as a Roman numeral character)
    * it is a combining mark
    * it is a non-spacing mark
    * isIdentifierIgnorable returns true for the character 

Additionally,
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html#isIdentifierIgn
orable(char)
states:

 The following Unicode characters are ignorable in a Java identifier or a Unicode
identifier:

    * ISO control characters that are not whitespace
          o '\u0000' through '\u0008'
          o '\u000E' through '\u001B'
          o '\u007F' through '\u009F' 
    * all characters that have the FORMAT general category value 

Original comment by robert.n...@gmail.com on 28 Apr 2008 at 9:47

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I can't speak for the developers but I suspect this approach won't fly for 
legal reasons.

Would it not be better to have a more permissive regex for 
ARRAY_FOR_NAME_PATTERN
with a capture group for the javaJavaIdentifierPart section and then just 
manually
check that for each char in the captured group Character.isJavaIdentifierPart(c)
returns true?

Original comment by mstud...@gmail.com on 28 Apr 2008 at 11:32

GoogleCodeExporter commented 9 years ago
You are probably right with regards to your legal assessment. It seems that the 
java
source code may not be re-distributed without the express written permission of 
Sun.
So I am deleting my comment that includes Sun's source.

Original comment by robert.n...@gmail.com on 29 Apr 2008 at 12:50

GoogleCodeExporter commented 9 years ago
What about using Jakarta Regexp (http://jakarta.apache.org/regexp) instead of 
the
java.util.regex classes? Jakarta Regexp even includes an analog of the
javaJavaIdentifierPart character category (called javapart in Jakarta Regexp). 
The
license is compatible, too. And since the java.util.regex package is only 
referenced
in two classes in the jarjar project, initially integrating Jakarta Regexp into
jarjar should not be difficult.

Regression testing might be a chore, but at least we would end up with a 
product that
is truly compatible with java 1.4.

Original comment by robert.n...@gmail.com on 29 Apr 2008 at 1:01

GoogleCodeExporter commented 9 years ago
I retrofitted my jarjar workspace to use jakarta regexp instead of 
java.util.regex.
And my results are a success. Mostly.

There is a problem. The unit tests will only complete successfully under java 
1.5
because the classes in enumtest.jar were compiled with java 1.5. Subsequently, 
if I
try to run the test-enum target under java 1.4, I get an
UnsupportedClassVersionError. The other unit tests do complete successfully.

Again, this solution works in practice for my java 1.4 projects that use 
jarjar. So
here is how you can implement it yourself:

- Download jakarta-regexp-1.5.jar to the jarjar lib directory
(http://mirrors.sirium.net/pub/apache/jakarta/regexp/source/jakarta-regexp-1.5.z
ip).

- Apply the attached diffs to build.xml,
com.tonicsystems.jarjar.PackageRemapper.java, and 
com.tonicsystems.jarjar.Wildcard.java

- Run the ant jar target and see what happens.

Original comment by robert.n...@gmail.com on 29 Apr 2008 at 6:56

GoogleCodeExporter commented 9 years ago
Here is the patch I referred to in comment 9.

Original comment by robert.n...@gmail.com on 29 Apr 2008 at 6:57

Attachments:

GoogleCodeExporter commented 9 years ago
Here is the Wildcard.diff with comments added.

@@ -16,18 +16,19 @@

 package com.tonicsystems.jarjar;

-import java.util.regex.Matcher;
-import java.util.regex.Pattern;
 import java.util.ArrayList;
 import java.util.Arrays;

+import org.apache.regexp.RE;
+
 class Wildcard
 {
-    private static Pattern dstar = Pattern.compile("\\*\\*");
-    private static Pattern star  = Pattern.compile("\\*");
-    private static Pattern estar = Pattern.compile("\\+\\??\\)\\Z");
+    private static RE dstar = new RE("\\*\\*");
+    private static RE star  = new RE("\\*");
+    private static RE estar = new RE("\\+\\??\\)\\Z");
+    private static RE nstar = new RE("\\(\\.\\(\\[\\^\\/\\]\\+\\)\\?\\)");

-    private final Pattern pattern;
+    private final RE pattern;
     private final int count;
     private final ArrayList parts = new ArrayList(16); // kept for debugging
     private final String[] strings;
@@ -42,12 +43,28 @@
             throw new IllegalArgumentException("The sequence '***' is invalid in a
package pattern");

         String regex = pattern;
-        regex = replaceAllLiteral(dstar, regex, "(.+?)");
+        // Changed to .*? from .+? because apparently java.util.regex is more
+       // permissive. There is a problem with this approach, though because
+       // when we apply the single star replacement, the star in .*? is
+       // changed.
+        regex = replaceAllLiteral(dstar, regex, "(.*?)");
         regex = replaceAllLiteral(star, regex, "([^/]+)");
         regex = replaceAllLiteral(estar, regex, "*)");
-        this.pattern = Pattern.compile("\\A" + regex + "\\Z");
-        this.count = this.pattern.matcher("foo").groupCount();
+        // And because the single star in the double star replacment was
+       // replaced, we need to change it back again.
+        regex = replaceAllLiteral(nstar, regex, "(.*?)");
+        this.pattern = new RE("^" + regex + "$");

+        // We match the raw pattern. Otherwise,
+       // org.apache.regexp.RE.getParenCount returns 0. Source diving indicates
+       // that this is because the paren count is only updated when
+       // org.apache.regexp.RE.match returns true.
+        this.pattern.match(pattern);
+        // We subtract one because, unlike java.util.regex.Matcher.getCount, 
the
+       // org.apache.regexp.RE.getParenCount method includes the whole
+       // expression as a group.
+        this.count = this.pattern.getParenCount() - 1;
+        
         // TODO: check for illegal characters
         char[] chars = result.toCharArray();
         int max = 0;
@@ -94,15 +111,14 @@
     }

     public boolean matches(String value) {
-        return getMatcher(value) != null;
+        return pattern.match(value);
     }

     public String replace(String value) {
-        Matcher matcher = getMatcher(value);
-        if (matcher != null) {
+        if (pattern.match(value) && checkIdentifierChars(value, "/")) {
             StringBuffer sb = new StringBuffer();
             for (int i = 0; i < strings.length; i++)
-                sb.append((refs[i] >= 0) ? matcher.group(refs[i]) : 
strings[i]);
+                sb.append((refs[i] >= 0) ? pattern.getParen(refs[i]) : 
strings[i]);
             return sb.toString();
         }
         return null;
@@ -108,13 +124,6 @@
         return null;
     }

-    private Matcher getMatcher(String value) {
-        Matcher matcher = pattern.matcher(value);
-        if (matcher.matches() && checkIdentifierChars(value, "/"))
-            return matcher;
-        return null;
-    }
-
     private static boolean checkIdentifierChars(String expr, String extra) {
         for (int i = 0, len = expr.length(); i < len; i++) {
             char c = expr.charAt(i);
@@ -126,9 +135,9 @@
         return true;
     }

-    private static String replaceAllLiteral(Pattern pattern, String value, 
String
replace) {
+    private static String replaceAllLiteral(RE pattern, String value, String 
replace) {
         replace = replace.replaceAll("([$\\\\])", "\\\\$0");
-        return pattern.matcher(value).replaceAll(replace);
+        return pattern.subst(value, replace);
     }

     public String toString() {

Original comment by robert.n...@gmail.com on 29 Apr 2008 at 7:13

GoogleCodeExporter commented 9 years ago
I've just been thinking about this some more. (I know, STOP IT ALREADY ;) )

Might it be desirable to make the Regular Expression engine to use user 
definable
from the task tag? I know that most developers probably won't care, and the 
impact of
such a change would be quite large with a minimal reward, and the change will
introduce another layer of complexity to the project.

Maybe I just answered my own question.

Original comment by robert.n...@gmail.com on 5 May 2008 at 4:38

GoogleCodeExporter commented 9 years ago
Does anyone still need 1.4 support? I am considering fixing this bug by moving 
jarjar
to 1.5-only, which would also allow the use of generics.

Since jarjar is a tool that is only used at build time I don't see a 1.5 
requirement
being a big deal. Thoughts?

Original comment by chris.no...@gmail.com on 30 Aug 2008 at 5:27

GoogleCodeExporter commented 9 years ago
It actually was a problem for me because I was forced by company mandate to use 
the
IBM 1.4 JDK for development -- even new development. I don't think this is a 
common
scenario, though, and since this is a build tool, most people will only start 
using
it in new projects -- projects that should really be using more modern versions 
of
java anyway.

So fire away. If I ever find myself forced to start a new project in 1.4 land, 
and I
need jarjar, I have a version that works for me.

Original comment by robert.n...@gmail.com on 31 Aug 2008 at 1:25

GoogleCodeExporter commented 9 years ago
Thanks, the latest source now relies on 1.5 features like generics.

Original comment by chris.no...@gmail.com on 13 Sep 2008 at 8:22