unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 176 forks source link

Validate components bag results with ICU4C results #1327

Open gregtatum opened 2 years ago

gregtatum commented 2 years ago

As part of #645, we should compare the results of ICU4C GetBestPattern against the results of the ICU4X components::Bag. It would be good to know where we give different results, and whether the differences are lower/higher quality.

sffc commented 2 years ago

CC @gnrunge to weigh in on how this fits in with data-driven testing.

gregtatum commented 2 years ago

The way I see this would be to generate a relatively full list of possible skeletons, and have the results from ICU4X and ICU4C encoded into the JSON. This way we can statically look at the quality of the results across products. This would mean there could be allowable differences, but we would at least know what the differences are, and can ensure the quality of the results.

gnrunge commented 2 years ago

Using the same test data on different projects is standard use case of overarching test data driven testing. Different here is the comparison of results among two projects. Can 'allowable differences' be formalized and become part of the verification data and trigger a test failure from side of ICU4C or ICU4X if the diffs go beyond certain thresholds? However, Greg's comment indicates that allowable differences may be more a judgement call, at least initially until more experience is gained.

zbraniecki commented 2 years ago

Can we test against test-262 first? We have a clear path to evaluate test-262 (ecma402) conformance via ICU4X->SpiderMonkey->test262 vs ICU4C->SpiderMonkey->test262.

We do not have a good way to test ICU4C test corpus against ICU4X impl.

sffc commented 2 years ago

Discussion 2022-03-31:

sffc commented 2 years ago

@gnrunge to come back with a design doc.

gregtatum commented 2 years ago

Having both ICU4C and test262 would help pinpoint where behavior changes are introduced. I don't think they are mutually exclusive.

zbraniecki commented 2 years ago

Having both ICU4C and test262 would help pinpoint where behavior changes are introduced. I don't think they are mutually exclusive.

They are not, but I think the two have very different cost/benefit considerations.

Adding test262 comparison should be quite easy via SpiderMonkey with path paved by last year unification, and would secure continuous comparison aimed at Mozilla's and ICU4X's strict goal of ecma402 compatibility. Maintaining that setup will be a sideeffect of Mozilla's goal of placing ICU4X in Gecko.

Adding ICU4C comparison will require completely new harness to either FFI ICU4C into ICU4X tests or reverse, working around lack of data driven tests in ICU4C and quite inconsistent use of data driven tests in ICU4X, and with all of that work done, give us a list of incompatibilities to siphon through to identify which ones we care about. Maintaining that setup will be artificial and valuable only for this exercise.