Missing tags for script identifiers

fschutt commented 3 years ago

I'm currently trying to shape a simple text as in the allsorts-tools/examples/shape file. Since I have to determine the script of the text block at runtime, I'm using whatlang to detect the language + script from the text itself. However, I'm missing some tags for the language codes:

use allsorts::tag;

// auto-detect script + language from text (todo: performance!)
let (lang, script) = whatlang::detect(text)
    .map(|info| (info.lang(), info.script()))
    .unwrap_or((Lang::English, Script::Latin));

let lang = lang.code().toupper();

let script_id = match language_info.script() {
    Script::Arabic => tag::ARAB,
    Script::Bengali => tag::BENG,
    Script::Cyrillic => tag::CYRL,
    Script::Devanagari => tag::DEVA,
    Script::Ethiopic => , // ??
    Script::Georgian => , // ??
    Script::Greek => tag::GREK,
    Script::Gujarati => tag::GUJR,
    Script::Gurmukhi => tag::GURU, // can also be GUR2
    Script::Hangul => , // ??
    Script::Hebrew => , // ??
    Script::Hiragana => , // ??
    Script::Kannada => tag::KNDA,
    Script::Katakana => , // ??
    Script::Khmer => , // TODO?? - unsupported?
    Script::Latin => tag::LATN,
    Script::Malayalam => tag::MLYM,
    Script::Mandarin => , // ??
    Script::Myanmar => ,  // ??
    Script::Oriya => tag::ORYA,
    Script::Sinhala => tag::SINH,
    Script::Tamil => tag::TAML,
    Script::Telugu => tag::TELU,
    Script::Thai => tag::THAI,
};

Is it possible to add these tags to the API, even if they are unsupported? Or is this by design? Thanks.

wezm commented 3 years ago

It's not really feasible to maintain a list of all possible tags. Ultimately they are just u32 values. The tag module has a macro that makes it a little more pleasant to construct them from a byte string. I'll make that public in the next release. In the meantime you might like to copy the macro and its supporting function into your own code:

/// Generate a 4-byte font table tag from byte string
///
/// Example:
///
/// ```
/// assert_eq!(tag!(b"glyf"), 0x676C7966);
/// ```
macro_rules! tag {
    ($w:expr) => {
        tag(*$w)
    };
}

const fn tag(chars: [u8; 4]) -> u32 {
    ((chars[3] as u32) << 0)
        | ((chars[2] as u32) << 8)
        | ((chars[1] as u32) << 16)
        | ((chars[0] as u32) << 24)
}

You would use this as follows for any tags missing from the tag module (See OpenType docs for tag values):

    ⋮
    Script::Kannada => tag::KNDA,
    Script::Katakana => tag!(b"kana"),
    Script::Khmer => tag!(b"khmr"),
    Script::Latin => tag::LATN,
    ⋮

wezm commented 3 years ago

Fixed by bf0f283e43ac0b8cb0934e9811843d4b6ca9ba0f

yeslogic / allsorts

Missing tags for script identifiers #35