koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

EPUB: fix possible buffer overflow #572

Closed benoit-pierre closed 2 months ago

benoit-pierre commented 2 months ago

Handle obfuscated font files smaller than 1040 bytes.


This change is Reviewable

poire-z commented 2 months ago

Can we assume the same issue can happen with the function just above, AdobeDemanglingStream(), from which I copied most of the code to make IdpfDemanglingStream() and that a similar fix can be made here (even if we can't really test it) ?

benoit-pierre commented 2 months ago

Yep. Will amend.

benoit-pierre commented 2 months ago

And my code is buggy, I'm surprised it works… We want the start of the read, so _base->GetPos() must be called before _base->Read(…).

benoit-pierre commented 2 months ago

And was the previous code for AdobeDemanglingStream correct? Do you have a file to test it @poire-z?

benoit-pierre commented 2 months ago

I think AdobeDemanglingStream is correct and IdpfDemanglingStream is buggy:

                int keyPos = (i + pos) & 15;
                ((lUInt8*)buf)[i] ^= _key[keyPos];

vs

                int keyPos = (i + pos) % 20;
                ((lUInt8*)buf)[i+pos] ^= _key[keyPos];

We should be writing at buf[i] in both cases, no?

poire-z commented 2 months ago

Here are the few EPUBs with some META-INF/encryption.xml I may have used for testing: [removed] (Tell me when downloaded, so I can remove it.)

poire-z commented 2 months ago

We should be writing at buf[i] in both cases, no?

Looks like so. Dunno why I changed it. If it worked, it's because we most often (always?) get here with getPos() = 0 - or Way after 1024 or 1040 because each call reads a bigger chunk than that?

benoit-pierre commented 2 months ago

Here are the few EPUBs with some META-INF/encryption.xml I may have used for testing: [removed] (Tell me when downloaded, so I can remove it.)

Thanks.

(Because right now I can #if 0 that whole de-obfuscate loop and still open my file with no issue…)

benoit-pierre commented 2 months ago

I get "unknown file format" errors from FT_New_Memory_Face when attempting to load the obfuscated fonts from your last 2 files and the EPUB from https://github.com/koreader/koreader/issues/12071.

poire-z commented 2 months ago

Before and after this PR ? If already before, well, let's not worry :) If it works for all the other EPUBs, I guess the code is correct, and these 2 EPUBs have it wrong (have you tried to deobfuscate them to see if they are correct font files?)

benoit-pierre commented 2 months ago

Before and after this PR ?

Yes.

Hexdumps of the first 4 bytes of the key and each obfuscated font:

The 4 fist bytes of each font should be the sfntVersion field: 0x00010000, or 0x4F54544F. But even with the same key, we somehow get a lot of variations.

poire-z commented 2 months ago

Any idea what other reading softwares that supports obfuscated fonts (dunno which does, calibre?) do?

benoit-pierre commented 2 months ago

I looked at what happens with calibre, and the deDRM plugin has some extra code for obfuscated fonts: it turns out that the unobfuscated data can sometime itself be a raw DEFLATE compressed stream!

After implementing, all fonts for those 3 ebooks now load fine.

Before/after screenshots:

poire-z commented 2 months ago

@benoit-pierre : any idea why I don't get the embedded font for the text in blue ? image Didn't you get that ? Any other setting you had to change to get it better?

I need to remove font-weight: bold; in this snippet to get the Zawgyi-One font to be used, which you seem to not have needed to:

.Emphasis4 {
        font-style: italic;
        color: #2f5496;
        font-family: Zawgyi-One,sans-serif;
        font-weight: bold;
        font-size: 1.091em;
}
benoit-pierre commented 2 months ago

I don't have to do anything to get it: clean build, no custom settings, removed .sdr entry: blue Zawgyi-One.

poire-z commented 2 months ago

OK. So, it happens when I have font-family font for sans-serif checked (so, I guess you don't have it checked, right?) image and it's ok when I unchecked it: image If I recheck it, it's still ok... until the reload, and then it's back to not ok.

Feels like some hell I've already met...

benoit-pierre commented 2 months ago

That font selection could use another sub-menu… It's annoying to use, since the page for the selected font is focused, I have to go back manually to the first one to get access to the 2 other submenus. As a new user, you would not even now they are available.

benoit-pierre commented 2 months ago

Why is my system installed DejaVu Sans not available in the font family serif sub-menu?

benoit-pierre commented 2 months ago

By default, everything in font family is unchecked.

benoit-pierre commented 2 months ago

[…] since the page for the selected font is focused, I have to go back manually to the first one […]

And you can't do it fast because the page height changes, so the back button position changes…

benoit-pierre commented 2 months ago

[…] since the page for the selected font is focused, I have to go back manually to the first one […]

And you can't do it fast because the page height changes, so the back button position changes…

Page up to the rescue! \o/

benoit-pierre commented 2 months ago

Or page down, it's a wrap! Even faster.

benoit-pierre commented 2 months ago

I still can't get into your state.

Frenzie commented 2 months ago

(Presumably Home/End should be make to work there.)

poire-z commented 2 months ago

By default, everything in font family is unchecked.

Ok, so you're saying you had it unchecked, right? And so that's the reason it worked for you.

Why is my system installed DejaVu Sans not available in the font family serif sub-menu?

No idea, it should be there in Sans> and in Serif> as we don't distinguish sans from serif.

That font selection could use another sub-menu…

Well, it's not super essential. It's just mimicking the main font menu & reusing the code, that's good enough.

I still can't get into your state.

Even if you let it reload - or quit, rm cache, and restart?

benoit-pierre commented 2 months ago
benoit-pierre commented 2 months ago

(Presumably Home/End should be make to work there.)

Nope.

poire-z commented 2 months ago

toggle Sans-serif: FreeSans in font family: still correct quit, remove cache, reload: still correct

Correct also for me. (Also correct with FreeSerif). But not correct if I choose another font, DejaVu Sans or Gentium.

May be because the Free fonts have support for that script, and it's them that are used - and our western eyes are not educated enough to tell :)

poire-z commented 2 months ago

I need to remove font-weight: bold; in this snippet to get the Zawgyi-One font to be used, which you seem to not have needed to:

.Emphasis4 {
        font-style: italic;
        color: #2f5496;
        font-family: Zawgyi-One,sans-serif;
        font-weight: bold;
        font-size: 1.091em;
}

So, not confirmed by printfs, but I believe that when the font you have associated to sans-serif has a bold variant (which is the case of my DejaVuSans, but not of our Free fonts), and because that Zawgyi-One is provided only in regular, that our found sans-serif bold ttf gets a bigger score than the Zawgyi-One (which would need to be synthecized bold), so it is used instead... :/

poire-z commented 2 months ago

Possible fix for the above issue:

--- a/crengine/src/lvfntman.cpp
+++ b/crengine/src/lvfntman.cpp
@@ -7137,5 +7137,5 @@ int LVFontDef::CalcMatch( const LVFontDef & def, bool useBias ) const
         + (features_match * 1000)
         + (family_match   * 100)
-        + (typeface_match * 1000);
+        + (typeface_match * 10000);

 //    printf("### %s (%d) vs %s (%d): size=%d weight=%d italic=%d family=%d typeface=%d bias=%d => %d\n",
@@ -7387,6 +7387,9 @@ LVFontCacheItem * LVFontCache::find( const LVFontDef * fntdef, bool useBias )
         else
             def.setTypeFace(lString8::empty_str);
+        int typeface_match = false;
         for (i=0; i<_instance_list.length(); i++) {
             int match = _instance_list[i]->_def.CalcMatch( def, useBias );
+            if ( match >= 2560000 )
+                typeface_match = true;
             match = match * 256 + ordering_weight;
             if (match > best_instance_match) {
@@ -7397,4 +7400,6 @@ LVFontCacheItem * LVFontCache::find( const LVFontDef * fntdef, bool useBias )
         for (i=0; i<_registered_list.length(); i++) {
             int match = _registered_list[i]->_def.CalcMatch( def, useBias );
+            if ( match >= 2560000 )
+                typeface_match = true;
             match = match * 256 + ordering_weight;
             if (match > best_match) {
@@ -7403,7 +7408,15 @@ LVFontCacheItem * LVFontCache::find( const LVFontDef * fntdef, bool useBias )
             }
         }
+        if ( typeface_match ) {
+            // No need to check next font names (which may get a better score
+            // if the first fonts do not have the requested italic or weight
+            // variants, so let's avoid that too).
+            break;
+        }
     }
     if (best_index<0)
         return NULL;
+    // if (best_instance_match >= best_match) printf("Find '%s' best instance: %s\n", fntdef->getTypeFace().c_str(), _instance_list[best_instance_index]->_def.getTypeFace().c_str());
+    // else printf("Find '%s' best registered: %s\n", fntdef->getTypeFace().c_str(), _registered_list[best_index]->_def.getTypeFace().c_str());
     if (best_instance_match >= best_match)
         return _instance_list[best_instance_index];

I'll keep that for later this summer, as it's a bit risky.