jakartaee / jsonb-api

Jakarta JSON Binding
https://eclipse-ee4j.github.io/jsonb-api/
Other
78 stars 39 forks source link

TCK: customizedmapping/numberformat/NumberFormatCustomizationTest needs update for French locale & JDK13+ #272

Open brideck opened 3 years ago

brideck commented 3 years ago

We're starting to do some TCK testing with Open Liberty and JDKs beyond 11, and have seen that two tests fail in https://github.com/eclipse-ee4j/jsonb-api/blob/master/tck/src/main/java/jakarta/json/bind/tck/customizedmapping/numberformat/NumberFormatCustomizationTest.java

This class has tests that use a number formatted with a French grouping separator. private static final String FRENCH_NUMBER = "\"123\\u00a0456,789\"";

However, starting with JDK13 the grouping separator changed from \u00a0 to \u202f. The test needs to be updated to behave differently when running with a newer version of the JDK.

rmannibucau commented 3 years ago

Hi Brian,

The test needs to be updated to behave differently when running with a newer version of the JDK.

I don't think test must change due to JDK version because it means you break your consumers if you do so.

Also the separator didn't change strictly change: https://github.com/openjdk/jdk16u/blob/cce99e57e19833e6783ba344864999ec8d475ba5/make/data/cldr/common/main/root.xml#L3036 vs https://github.com/openjdk/jdk8u/blob/master/jdk/src/share/classes/sun/util/cldr/resources/21_0_1/common/main/root.xml#L2117 so it is likely a source encoding issue in the JDK (now String has a coder field which is read as UTF16 for these FormatData classes), can be worth throwing a bug there too since it is a breaking change and I didn't see it enphased.

scottmarlow commented 3 years ago

Hi Brian,

The test needs to be updated to behave differently when running with a newer version of the JDK.

I don't think test must change due to JDK version because it means you break your consumers if you do so.

Is there a test change that would allow passing on the newer versions of the JDK (13+) but still pass on the older JDK version?

IMO, we cannot make any test changes for Jakarta EE 9.1 but a good first step for this issue is identifying the best possible fix and then discuss what to do for Jakarta EE 9.1.

Also the separator didn't change strictly change: https://github.com/openjdk/jdk16u/blob/cce99e57e19833e6783ba344864999ec8d475ba5/make/data/cldr/common/main/root.xml#L3036 vs https://github.com/openjdk/jdk8u/blob/master/jdk/src/share/classes/sun/util/cldr/resources/21_0_1/common/main/root.xml#L2117 so it is likely a source encoding issue in the JDK (now String has a coder field which is read as UTF16 for these FormatData classes), can be worth throwing a bug there too since it is a breaking change and I didn't see it enphased.

rmannibucau commented 3 years ago

Is there a test change that would allow passing on the newer versions of the JDK (13+) but still pass on the older JDK version?

Hacky yes but I guess we don't want to go that way so maybe either spec should consider it is not important and match the string by a wider regex or the spec should be enforced by forcing the serialization to be the same even on jdk >= 13 (to be concrete impl would reimplement number serialization to avoid to be JDK dependent).

IMO, we cannot make any test changes for Jakarta EE 9.1 but a good first step for this issue is identifying the best possible fix and then discuss what to do for Jakarta EE 9.1.

Right but if we manage main impls (yasson and johnzon?) to get the fix it is as if EE got the fix at the end and it looks very feasible. That said I would investigate the JDK bug option since it is an unexpected breaking change from my window and not sure the switch from latin to UTF16 source was identified as breaking formatting.

brideck commented 3 years ago

My understanding is that the change is because of an update to the underlying CLDR locale implementation. Search on "French" here: http://cldr.unicode.org/index/downloads/cldr-34

Given that this is another CLDR difference, it is conceivable that we could get this to pass with the same -Djava.locale.providers=COMPAT argument that we used to resolve a JDK11+ issue with the JSTL TCK. I have not tried this yet to see if it would work.

rmannibucau commented 3 years ago

@brideck right but at the end we don't want the JSON representation to change because the JVM was tuned or cldr changed and I really think this issue was missed by the JDK cause it breaks several apps (outside of json land), this is why i proposed to push it back to the JDK to see how they see it before acting in jakarta. On a Jakarta land this is a breaking change we can't ignore IMHO (same as we can't break API).

brideck commented 3 years ago

@rmannibucau

They've taken multiple bugs on this and have closed them stating that this is an intentional change. ex: https://bugs.openjdk.java.net/browse/JDK-8226867

One of those bugs has this statement, which would make for a sensible test fix:

Instead of hardcoding \u00A0, applications can use DecimalFormatSymbols.getInstance(Locale.FRENCH).getGroupingSeparator().

It is also listed on the CLDR page I linked above as a migration issue. image

rmannibucau commented 3 years ago

@brideck hmm, I see. Issue being jakarta should provide a way to stay backward compatible (so the JDK I think) and likely not reimplement all locale (alternative to not supporting it makes our API fully inconsistent which is not good as well). A quick fix can be to add a exception for french locale but would be saner to try to get some guarantees against future incompatible breaking change from the JDK itself again.

@m0mus any hope to get it pushed back at jdk level?

m0mus commented 3 years ago

I don't think we have a power to influence the way JDK goes. :) But what we can do is to modify a test the way that it runs properly on JDK 13+. I'm not sure that we tested Yasson on JDKs > 11.

rmannibucau commented 3 years ago

But what we can do is to modify a test the way that it runs properly on JDK 13+. I'm not sure that we tested Yasson on JDKs > 11.

Ok but we still have this as a bug since we broke JSONB contract so, due to backward compatibility of Jakarta platform, we should still find a solution. So changing the test would move the issue to the spec and require another test written exactly like this one. == no need to change the test ;).

Either we make it exceptional (not sure it is only fr locale) or not but we can't just break deployed application/API IMHO, some consumers are broken with such change and interoperability is lost.

brideck commented 3 years ago

FWIW, I have verified that I can get these tests to pass with Open Liberty by using the -Djava.locale.providers=COMPAT JVM option. This reverts the JVM to using the pre-JDK11 locales, from before CLDR was adopted.

jungm commented 2 months ago

JDK 23 removed COMPAT locale data and broke the java.locale.providers=COMPAT workaround we used in Apache Johnzon to run the TCK on the latest JDK releases: https://bugs.openjdk.org/browse/JDK-8325568

IMO we need a TCK service release, reading the comments before me does not sound like we can push this back as a JDK bug because they say it's intentional https://bugs.openjdk.org/browse/JDK-8226867

rmannibucau commented 2 months ago

@jungm guess we can handle it at parser level since we have a kind of parseNumber, and we can know if we have a specific local so we can handle exceptional chars there but the spec should define if it breaks some users (mainly API since other parts are often more stable in practise) and from an API standpoint you can't control that so much (and the StreamTokenInputStream hack is ugly ;)).

jungm commented 2 months ago

@rmannibucau

guess we can handle it at parser level since we have a kind of parseNumber, and we can know if we have a specific local so we can handle exceptional chars there

I can think of so many ways this could break or cause unintentional behavior - I don't like this approach of trying to fix this in jsonb/johnzon/yasson

think the question really boils down to: do we want to outsmart the JDK? Do we really want to outsmart unicode CLDR?

rmannibucau commented 2 months ago

@jungm I agree but it already broke with the past releases (I don't have the details handy but the Double#toString impl changed so the JSON changed in most impl and some changes were not parsed OOTB by js clients) so guess JSON-B should pick a representation independent from the JDK for everything and can't rely on Java API for the "contract".

t1 commented 1 month ago

One could arguet, that the TCK is overly specific. We could make it more general, by one of these options:

  1. Make the patterns accept NNBSP as well as NBSP.
  2. Make the patterns accept either one or the other, depending on the Java version.
  3. Use the JDK itself to calculate the expected value. I think that last option is the best.
t1 commented 1 month ago

Would it help, if I created a PR for this?