unicode-org / unicodetools

home of unicodetools and https://util.unicode.org JSPs
https://util.unicode.org
Other
50 stars 39 forks source link

there can be only one UnicodeProperty.java #66

Open markusicu opened 3 years ago

markusicu commented 3 years ago

Background: ucd-dev email thread “Unicode Tools vs. UnicodeProperty.java

We have two versions of org/unicode/cldr/util/props/UnicodeProperty.java, one in CLDR and one in the Unicode Tools. Until 1.5 years ago they were identical, but they have been diverging and are now no longer even interface-compatible. The CLDR version is the newer one.

For now, I am adding a note to the setup instructions to move cldr-code above unicodetools in the build path, but of course that's evil and brittle.

We had agreed to move UnicodeProperty from CLDR into the Unicode Tools, but other CLDR code depends on this class, and I am not sure if I have time to get through that. Maybe I just rename the unused Unicode Tools version at first.

There is also a third version of UnicodeProperty.java, in the UnicodeJsps code, but thankfully it is in a different package. Still, this is confusing and error-prone; we should either merge the classes or give them distinct names.

srl295 commented 2 years ago
macchiati commented 2 years ago

I tried at one point to pull UnicodeProperty out from CLDR, but it is not simple. There are classes that depend on classes and so on, soit was too hard to figure out what the tangled set of dependencies were, and where I could cut the cord to allow me to extract a set of classes and tie off the loose ends.

Mark

On Sat, May 29, 2021 at 10:45 AM Markus Scherer @.***> wrote:

Background: ucd-dev email thread “Unicode Tools vs. UnicodeProperty.java https://groups.google.com/g/ucd-dev/c/H60xj1kTfCU

We have two versions of org/unicode/cldr/util/props/UnicodeProperty.java, one in CLDR and one in the Unicode Tools. Until 1.5 years ago they were identical, but they have been diverging and are now no longer even interface-compatible. The CLDR version is the newer one.

For now, I am adding a note to the setup instructions to move cldr-code above unicodetools in the build path, but of course that's evil and brittle.

We had agreed to move UnicodeProperty from CLDR into the Unicode Tools, but other CLDR code depends on this class, and I am not sure if I have time to get through that. Maybe I just rename the unused Unicode Tools version at first.

There is also a third version of UnicodeProperty.java, in the UnicodeJsps code, but thankfully it is in a different package. Still, this is confusing and error-prone; we should either merge the classes or give them distinct names.

- https://github.com/unicode-org/cldr/blob/master/tools/cldr-code/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java

https://github.com/unicode-org/unicodetools/blob/main/unicodetools/org/unicode/cldr/util/props/UnicodeProperty.java

https://github.com/unicode-org/unicodetools/blob/main/UnicodeJsps/src/main/java/org/unicode/jsp/UnicodeProperty.java

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/66, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMDLXLVTZAO7LDBIWP3TQER4DANCNFSM45YP3R3A .

srl295 commented 2 years ago

Josh's analysis: https://docs.google.com/spreadsheets/d/1sAIFut5QL30EXwMK_ZM4yKzWJge2gatLA0yfl_zxtSY/edit?usp=sharing

josh-hadley commented 2 years ago

As @srl295 posted, I have performed a basic analysis of duplicated .java files as a starting point. I attempted a very basic diff to try to triage a bit but I think that was not terribly useful with the multi-file overlaps. Sadly, looks like only one duplicated file is identical.

So I think the next step will be to look more closely into the overlaps and determine commonalities/differences inside and figure out what can be re-worked, removed, etc.

srl295 commented 2 years ago

Now that https://github.com/unicode-org/unicodetools/pull/148 (#131) is merged, the JSPs can easily call into unicodetools librarycode.

srl295 commented 2 years ago

this bit me trying to do #186 (IDNA)— because of the order of loading, ICU4J's UnicodeSet ends up calling into unicodetools' UnicodeProperty and crashes.

macchiati commented 2 years ago

Hmmm. We could rename unicodetools' version

On Wed, Mar 23, 2022 at 11:55 AM Steven R. Loomis @.***> wrote:

this bit me trying to do #186 https://github.com/unicode-org/unicodetools/issues/186 (IDNA)— because of the order of loading, ICU4J's UnicodeSet ends up calling into unicodetools' UnicodeProperty and crashes.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/66#issuecomment-1076705347, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBQ54LIJJAYF35ETIDVBNSIJANCNFSM45YP3R3A . You are receiving this because you commented.Message ID: @.***>

srl295 commented 2 years ago

@macchiati I'm going to try moving unicodetools' version into org.unicode.props …

srl295 commented 2 years ago

Back burner on this due to CLDR release, but just as an example of the layering problems:

https://github.com/unicode-org/unicodetools/blob/8d6bd2e50b541f68bb86ae4205213f7af71aa8bb/unicodetools/src/main/java/org/unicode/tools/TestSegments.java#L151-L155

    private static void doCompare(UnicodeProperty.Factory factory, Segmenter rl, String line) {
        RandomStringGenerator rsg;
        RuleBasedBreakIterator icuBreak;
        if (line.equals("compareGrapheme")) {
            rsg = new RandomStringGenerator(factory, "GraphemeClusterBreak");

So here we are calling into CLDR (RandomStringGenerator) but passing it unicodetools' version of UnicodeProperty.Factory. The only way this “works” is basically that UnicodeProperty.class from CLDR gets replaced at runtime with unicodetools' … not good

markusicu commented 2 years ago

It looks like CLDR RandomStringGenerator has other problems, too. It sounds generic, but its init() function is specific to working with the GCB property. And when you call the (Factory, String, bool, bool) constructor, you get some data from the input factory and other data from the ICUPropertyFactory.

If this is used from CLDR code, then I would clone it into Unicode Tools (with a different package and modified class name). If it's not used from CLDR code, then move it into Unicode Tools.

It might also be possible to pull all of the factory calls out into the call site and construct this generator only with UnicodeSet/UnicodeMap/String/boolean values.

srl295 commented 2 years ago

Observations (not complaints):

markusicu commented 2 years ago

I verified that the RandomStringGenerator is not used in CLDR. Please move it out of there and into the Unicode Tools. https://unicode-org.atlassian.net/browse/CLDR-15482

srl295 commented 2 years ago

@markusicu my #214 does copy it into unicodetools paving the way for deletion on cldr.

srl295 commented 2 years ago

It's "now" illegal (as of JDK 9?) to have more than one module export the same package. So, CLDR and ICU aren't supposed to both export to com.ibm.icu.text.

markusicu commented 2 years ago

Something I learned (and others might have realized some time ago):

The Unicode Tools use the CLDR UnicodePropertySymbolTable which calls into UnicodeProperty. That must be a CLDR UnicodeProperty, which is what the symbol table code calls from within the CLDR jar. The Tools provide the symbol table with a ToolUnicodeProperty (which lives in the Tools) which extends UnicodeProperty. Bad things happen when CLDR code uses the Unicode Tools version of UnicodeProperty.

Unicode Tools code needs the Unicode Tools version of UnicodeProperty.

David changed CLDR UnicodeProperty to base its Matcher classes on a Java Predicate: https://unicode-org.atlassian.net/browse/CLDR-13750 https://github.com/unicode-org/cldr/pull/460

The Unicode Tools version is still basing the Matchers on an older ObjectMatcher interface.

markusicu commented 2 years ago

FYI In pull request #214 @srl295 copied most of the classes from https://github.com/unicode-org/cldr/tree/main/tools/cldr-code/src/main/java/org/unicode/cldr/util/props into the unicodetools. It looks like only abstract class UnicodeLabel (the base for all of the UnicodeProperty classes) still exists only in CLDR.

markusicu commented 2 years ago
meld ~/cldr/uni/src/tools/cldr-code/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java unicodetools/src/main/java/org/unicode/props/UnicodeProperty.java
markusicu commented 7 months ago

@eggrobin just deleted the UnicodeJSPs version in PR #649.