Closed Manishearth closed 1 year ago
Currently ICU4C's TestGreekUpper()
test doesn't work due to this. Here's the converted Rust version if anyone needs later.
/// ICU4C's TestGreekUpper
#[test]
fn test_greek_upper() {
let cm = CaseMapping::new_with_locale(&locale!("el"));
// https://unicode-org.atlassian.net/browse/ICU-5456
assert_eq!(cm.to_full_uppercase_string("άδικος, κείμενο, ίριδα"), "ΑΔΙΚΟΣ, ΚΕΙΜΕΝΟ, ΙΡΙΔΑ");
// https://bugzilla.mozilla.org/show_bug.cgi?id=307039
// https://bug307039.bmoattachments.org/attachment.cgi?id=194893
assert_eq!(cm.to_full_uppercase_string("Πατάτα"), "ΠΑΤΑΤΑ");
assert_eq!(cm.to_full_uppercase_string("Αέρας, Μυστήριο, Ωραίο"), "ΑΕΡΑΣ, ΜΥΣΤΗΡΙΟ, ΩΡΑΙΟ");
assert_eq!(cm.to_full_uppercase_string("Μαΐου, Πόρος, Ρύθμιση"), "ΜΑΪΟΥ, ΠΟΡΟΣ, ΡΥΘΜΙΣΗ");
assert_eq!(cm.to_full_uppercase_string("ΰ, Τηρώ, Μάιος"), "Ϋ, ΤΗΡΩ, ΜΑΪΟΣ");
assert_eq!(cm.to_full_uppercase_string("άυλος"), "ΑΫΛΟΣ");
assert_eq!(cm.to_full_uppercase_string("ΑΫΛΟΣ"), "ΑΫΛΟΣ");
assert_eq!(cm.to_full_uppercase_string("Άκλιτα ρήματα ή άκλιτες μετοχές"), "ΑΚΛΙΤΑ ΡΗΜΑΤΑ Ή ΑΚΛΙΤΕΣ ΜΕΤΟΧΕΣ");
// http://www.unicode.org/udhr/d/udhr_ell_monotonic.html
assert_eq!(cm.to_full_uppercase_string("Επειδή η αναγνώριση της αξιοπρέπειας"), "ΕΠΕΙΔΗ Η ΑΝΑΓΝΩΡΙΣΗ ΤΗΣ ΑΞΙΟΠΡΕΠΕΙΑΣ");
assert_eq!(cm.to_full_uppercase_string("νομικού ή διεθνούς"), "ΝΟΜΙΚΟΥ Ή ΔΙΕΘΝΟΥΣ");
// http://unicode.org/udhr/d/udhr_ell_polytonic.html
assert_eq!(cm.to_full_uppercase_string("Ἐπειδὴ ἡ ἀναγνώριση"), "ΕΠΕΙΔΗ Η ΑΝΑΓΝΩΡΙΣΗ");
assert_eq!(cm.to_full_uppercase_string("νομικοῦ ἢ διεθνοῦς"), "ΝΟΜΙΚΟΥ Ή ΔΙΕΘΝΟΥΣ");
// From Google bug report
assert_eq!(cm.to_full_uppercase_string("Νέο, Δημιουργία"), "ΝΕΟ, ΔΗΜΙΟΥΡΓΙΑ");
// http://crbug.com/234797
assert_eq!(cm.to_full_uppercase_string("Ελάτε να φάτε τα καλύτερα παϊδάκια!"), "ΕΛΑΤΕ ΝΑ ΦΑΤΕ ΤΑ ΚΑΛΥΤΕΡΑ ΠΑΪΔΑΚΙΑ!");
assert_eq!(cm.to_full_uppercase_string("Μαΐου, τρόλεϊ"), "ΜΑΪΟΥ, ΤΡΟΛΕΪ");
assert_eq!(cm.to_full_uppercase_string("Το ένα ή το άλλο."), "ΤΟ ΕΝΑ Ή ΤΟ ΑΛΛΟ.");
// http://multilingualtypesetting.co.uk/blog/greek-typesetting-tips/
assert_eq!(cm.to_full_uppercase_string("ρωμέικα"), "ΡΩΜΕΪΚΑ");
assert_eq!(cm.to_full_uppercase_string("ή."), "Ή.");
}
This seems like an i18n quality bug, and we don't want to advertise a component as stabilized with known i18n quality bugs. That's one of the checkboxes for stabilizing any component (along with FFI and docs). We can slip on feature coverage but not correctness
Would someone be interested in trying to implement this? There's prototype code in https://icu.unicode.org/design/case/greek-upper, and ICU4C/ICU4J both have working implementations.
I am unlikely to have time to get to this this week.
Part of https://github.com/unicode-org/icu4x/issues/3234
See https://unicode-org.atlassian.net/browse/ICU-5456, ICU4X does not implement this.
The special case is implemented in GreekUpper in ICU4C and ICU4J. Seems somewhat involved.