Closed GoogleCodeExporter closed 9 years ago
The output that I see is different. Instead, I see
"ExceptionInInitializerError".
Upon hacking the source and implanting appropriate debugging, though, I see
that the
underlying cause is indeed the {javaJavaIdentifierPart} category.
It should be possible to recreate the category manually without having to rely
on the
JDK 5+ specific category.
Original comment by robert.n...@gmail.com
on 28 Apr 2008 at 9:17
[deleted comment]
I see from SVN that line 30 of com.tonicsystems.jarjar.PackageRemapper is
private static final Pattern ARRAY_FOR_NAME_PATTERN
= Pattern.compile("\\[L[\\p{javaJavaIdentifierPart}\\.]+?;");
wouldn't the following accomplish the same?
private static final Pattern ARRAY_FOR_NAME_PATTERN
= Pattern.compile("\\[L[[A-Z,a-z,0-9,\\$,\\_]\\.]+?;");
This might be overly simplistic.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html#isJavaIdentifie
rPart(char)
states:
A character may be part of a Java identifier if any of the following are true:
* it is a letter
* it is a currency symbol (such as '$')
* it is a connecting punctuation character (such as '_')
* it is a digit
* it is a numeric letter (such as a Roman numeral character)
* it is a combining mark
* it is a non-spacing mark
* isIdentifierIgnorable returns true for the character
Additionally,
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html#isIdentifierIgn
orable(char)
states:
The following Unicode characters are ignorable in a Java identifier or a Unicode
identifier:
* ISO control characters that are not whitespace
o '\u0000' through '\u0008'
o '\u000E' through '\u001B'
o '\u007F' through '\u009F'
* all characters that have the FORMAT general category value
Original comment by robert.n...@gmail.com
on 28 Apr 2008 at 9:47
[deleted comment]
[deleted comment]
I can't speak for the developers but I suspect this approach won't fly for
legal reasons.
Would it not be better to have a more permissive regex for
ARRAY_FOR_NAME_PATTERN
with a capture group for the javaJavaIdentifierPart section and then just
manually
check that for each char in the captured group Character.isJavaIdentifierPart(c)
returns true?
Original comment by mstud...@gmail.com
on 28 Apr 2008 at 11:32
You are probably right with regards to your legal assessment. It seems that the
java
source code may not be re-distributed without the express written permission of
Sun.
So I am deleting my comment that includes Sun's source.
Original comment by robert.n...@gmail.com
on 29 Apr 2008 at 12:50
What about using Jakarta Regexp (http://jakarta.apache.org/regexp) instead of
the
java.util.regex classes? Jakarta Regexp even includes an analog of the
javaJavaIdentifierPart character category (called javapart in Jakarta Regexp).
The
license is compatible, too. And since the java.util.regex package is only
referenced
in two classes in the jarjar project, initially integrating Jakarta Regexp into
jarjar should not be difficult.
Regression testing might be a chore, but at least we would end up with a
product that
is truly compatible with java 1.4.
Original comment by robert.n...@gmail.com
on 29 Apr 2008 at 1:01
I retrofitted my jarjar workspace to use jakarta regexp instead of
java.util.regex.
And my results are a success. Mostly.
There is a problem. The unit tests will only complete successfully under java
1.5
because the classes in enumtest.jar were compiled with java 1.5. Subsequently,
if I
try to run the test-enum target under java 1.4, I get an
UnsupportedClassVersionError. The other unit tests do complete successfully.
Again, this solution works in practice for my java 1.4 projects that use
jarjar. So
here is how you can implement it yourself:
- Download jakarta-regexp-1.5.jar to the jarjar lib directory
(http://mirrors.sirium.net/pub/apache/jakarta/regexp/source/jakarta-regexp-1.5.z
ip).
- Apply the attached diffs to build.xml,
com.tonicsystems.jarjar.PackageRemapper.java, and
com.tonicsystems.jarjar.Wildcard.java
- Run the ant jar target and see what happens.
Original comment by robert.n...@gmail.com
on 29 Apr 2008 at 6:56
Here is the patch I referred to in comment 9.
Original comment by robert.n...@gmail.com
on 29 Apr 2008 at 6:57
Attachments:
Here is the Wildcard.diff with comments added.
@@ -16,18 +16,19 @@
package com.tonicsystems.jarjar;
-import java.util.regex.Matcher;
-import java.util.regex.Pattern;
import java.util.ArrayList;
import java.util.Arrays;
+import org.apache.regexp.RE;
+
class Wildcard
{
- private static Pattern dstar = Pattern.compile("\\*\\*");
- private static Pattern star = Pattern.compile("\\*");
- private static Pattern estar = Pattern.compile("\\+\\??\\)\\Z");
+ private static RE dstar = new RE("\\*\\*");
+ private static RE star = new RE("\\*");
+ private static RE estar = new RE("\\+\\??\\)\\Z");
+ private static RE nstar = new RE("\\(\\.\\(\\[\\^\\/\\]\\+\\)\\?\\)");
- private final Pattern pattern;
+ private final RE pattern;
private final int count;
private final ArrayList parts = new ArrayList(16); // kept for debugging
private final String[] strings;
@@ -42,12 +43,28 @@
throw new IllegalArgumentException("The sequence '***' is invalid in a
package pattern");
String regex = pattern;
- regex = replaceAllLiteral(dstar, regex, "(.+?)");
+ // Changed to .*? from .+? because apparently java.util.regex is more
+ // permissive. There is a problem with this approach, though because
+ // when we apply the single star replacement, the star in .*? is
+ // changed.
+ regex = replaceAllLiteral(dstar, regex, "(.*?)");
regex = replaceAllLiteral(star, regex, "([^/]+)");
regex = replaceAllLiteral(estar, regex, "*)");
- this.pattern = Pattern.compile("\\A" + regex + "\\Z");
- this.count = this.pattern.matcher("foo").groupCount();
+ // And because the single star in the double star replacment was
+ // replaced, we need to change it back again.
+ regex = replaceAllLiteral(nstar, regex, "(.*?)");
+ this.pattern = new RE("^" + regex + "$");
+ // We match the raw pattern. Otherwise,
+ // org.apache.regexp.RE.getParenCount returns 0. Source diving indicates
+ // that this is because the paren count is only updated when
+ // org.apache.regexp.RE.match returns true.
+ this.pattern.match(pattern);
+ // We subtract one because, unlike java.util.regex.Matcher.getCount,
the
+ // org.apache.regexp.RE.getParenCount method includes the whole
+ // expression as a group.
+ this.count = this.pattern.getParenCount() - 1;
+
// TODO: check for illegal characters
char[] chars = result.toCharArray();
int max = 0;
@@ -94,15 +111,14 @@
}
public boolean matches(String value) {
- return getMatcher(value) != null;
+ return pattern.match(value);
}
public String replace(String value) {
- Matcher matcher = getMatcher(value);
- if (matcher != null) {
+ if (pattern.match(value) && checkIdentifierChars(value, "/")) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < strings.length; i++)
- sb.append((refs[i] >= 0) ? matcher.group(refs[i]) :
strings[i]);
+ sb.append((refs[i] >= 0) ? pattern.getParen(refs[i]) :
strings[i]);
return sb.toString();
}
return null;
@@ -108,13 +124,6 @@
return null;
}
- private Matcher getMatcher(String value) {
- Matcher matcher = pattern.matcher(value);
- if (matcher.matches() && checkIdentifierChars(value, "/"))
- return matcher;
- return null;
- }
-
private static boolean checkIdentifierChars(String expr, String extra) {
for (int i = 0, len = expr.length(); i < len; i++) {
char c = expr.charAt(i);
@@ -126,9 +135,9 @@
return true;
}
- private static String replaceAllLiteral(Pattern pattern, String value,
String
replace) {
+ private static String replaceAllLiteral(RE pattern, String value, String
replace) {
replace = replace.replaceAll("([$\\\\])", "\\\\$0");
- return pattern.matcher(value).replaceAll(replace);
+ return pattern.subst(value, replace);
}
public String toString() {
Original comment by robert.n...@gmail.com
on 29 Apr 2008 at 7:13
I've just been thinking about this some more. (I know, STOP IT ALREADY ;) )
Might it be desirable to make the Regular Expression engine to use user
definable
from the task tag? I know that most developers probably won't care, and the
impact of
such a change would be quite large with a minimal reward, and the change will
introduce another layer of complexity to the project.
Maybe I just answered my own question.
Original comment by robert.n...@gmail.com
on 5 May 2008 at 4:38
Does anyone still need 1.4 support? I am considering fixing this bug by moving
jarjar
to 1.5-only, which would also allow the use of generics.
Since jarjar is a tool that is only used at build time I don't see a 1.5
requirement
being a big deal. Thoughts?
Original comment by chris.no...@gmail.com
on 30 Aug 2008 at 5:27
It actually was a problem for me because I was forced by company mandate to use
the
IBM 1.4 JDK for development -- even new development. I don't think this is a
common
scenario, though, and since this is a build tool, most people will only start
using
it in new projects -- projects that should really be using more modern versions
of
java anyway.
So fire away. If I ever find myself forced to start a new project in 1.4 land,
and I
need jarjar, I have a version that works for me.
Original comment by robert.n...@gmail.com
on 31 Aug 2008 at 1:25
Thanks, the latest source now relies on 1.5 features like generics.
Original comment by chris.no...@gmail.com
on 13 Sep 2008 at 8:22
Original issue reported on code.google.com by
mstud...@gmail.com
on 3 Apr 2008 at 12:38