Open eggrobin opened 1 year ago
(Alternatively: Is this doable as a proper monkey-free invariant test? The LB-GCB inconsistencies are a mess because both algorithms are quite complicated; but normalization is comparatively simple, maybe we can actually prove things.)
Both tests would be good to add in unicode tools... (for the GCB-anythingelse consistency, monkey tests would be simplest).
On Wed, Aug 16, 2023 at 11:08 AM Robin Leroy @.***> wrote:
(Alternatively: Is this doable as a proper monkey-free invariant test? The LB-GCB inconsistencies are a mess because both algorithms are quite complicated; but normalization is comparatively simple, maybe we can actually prove things.)
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/522#issuecomment-1681063123, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMGMZYHAZNDEIR5IWKDXVUEALANCNFSM6AAAAAA3SZFIKQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
We promise that our segmentation algorithms are consistent with NFD:
However, we do not test that here (otherwise we would have spotted the Kirat Rai issue in #445).
It should not be too difficult to write an ICU monkey test for that (I am working on a similar one to investigate the ancient LB-GCB inconsistency AIs), but that would not be enough: we only run the ICU tests with new rules and data relatively late in the beta, whereas here this was spotted pre-alpha, and put encoding model questions on the table (we kept the encoding as previously decided in that case, but it is not implausible that we could decide differently in another case).
Could we we have segmentation monkeys in unicodetools?