pdf-rs / pdf_render

MIT License
104 stars 21 forks source link

Support Unicode CJK CMap #11

Open neko-para opened 1 year ago

neko-para commented 1 year ago

There are a lot of CJK CMaps, but some of them are just Utf16-BE. We can check the prefix of unknown encoding and treat the encoding begin with Uni as Utf16-BE. Here are some information from pdf v1.5 spec p404 image image

// fontentry.rs
let source_encoding = match base_encoding {
    Some(BaseEncoding::StandardEncoding) => Some(Encoding::AdobeStandard),
    Some(BaseEncoding::SymbolEncoding) => Some(Encoding::AdobeSymbol),
    Some(BaseEncoding::WinAnsiEncoding) => Some(Encoding::WinAnsiEncoding),
    Some(BaseEncoding::MacRomanEncoding) => Some(Encoding::MacRomanEncoding),
    Some(BaseEncoding::MacExpertEncoding) => Some(Encoding::AdobeExpert),
    ref e => {
        // we can do the check here, return AdobeStandard if matches.
        warn!("unsupported pdf encoding {:?}", e);
        None
    }
};