When a DRM key error occurs while opening an EPUB, navigator is stuck in loading state

ettore commented 4 months ago

Describe the bug

Using Readium with Content Protection (for example using a DRM system such as Axis360) involves tracking potential DRM errors, such as invalid keys, keys corruption, or more in general the lack of a functioning key asset that decrypted the DRM-encrypted content. When such a situation happens, the Readium swift-toolkit doesn't present any errors to the user, and instead only shows a never-ending activity indicator.

How to reproduce?

manually corrupt a DRM key or other asset that will prevent the decryption of any EPUB resource
attempt to open the EPUB
The EPUBSpreadView is presented but it shows an activity indicator that never ends.

Readium version

current top of main

OS version

iOS 17

Testing device

Various simulators and iPhone 11

Environment

macOS: 13.5
platform: arm64
carthage: 0.39.1
Build version 15A507

Additional context

On this branch I added a modification per discussion in slack where I am "optimistically" scheduling the stop of the activity indicator, and then canceling the stop if the spread starts to load. Below is the console log when I attempt to open an EPUB with a corrupted DRM (Axis360) key:

☝️ Readium 2 Log enabled with minimum severity level of [debug].
ℹ️ Streamer.swift:98:   Open FileAsset(/Users/ettorepasquini/Library/Developer/CoreSimulator/Devices/3B3CFE90-3FEE-4680-BF46-E772D44C0D23/data/Containers/Data/Application/630AB1E2-996A-48C8-BDAE-3659CEAA62A9/Documents/urn%3Alibrarysimplified.org%2Fterms%2Fid%2FAxis%2520360%2520ID%2F0026047662.epub)

<< TransformingResource::transform(_:) throws invalid DRM key length error >> 

🔎️ EPUBNavigatorViewController.swift:440:   -> on load(Optional({"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","locations":{"progression":0,"totalProgression":0.047619047619047616},"type":"application/xhtml+xml"}))
🔎️ EPUBNavigatorViewController.swift:229:   * transitioned to loading(pendingLocator: Optional({"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","locations":{"progression":0,"totalProgression":0.047619047619047616},"type":"application/xhtml+xml"}))
🔎️ EPUBSpreadView.swift:457:    **** didFinish
🔎️ EPUBSpreadView.swift:564:    **** scheduling work item to stop activityIndicator
🔎️ EPUBSpreadView.swift:153:    Evaluate script: spread.load([{"link":{"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","properties":{"encrypted":{"algorithm":"http://www.w3.org/2001/04/xmlenc#aes256-cbc","scheme":"http://readium.org/2014/01/lcp"},"id":"d_002"},"templated":false,"type":"application/xhtml+xml"},"page":"left","url":"http://localhost:61812/BB8FC04B-8DE9-44CB-8DA7-9C07227679F8/OEBPS/heos_9781250765918_epub3_001_r1.xhtml"}]);
🔎️ EPUBSpreadView.swift:559:    **** activityIndicator stopping
🔎️ EPUBSpreadView.swift:234:    **** spreadLoadDidStart: canceling activity indicator stop
🔎️ EPUBSpreadView.swift:234:    **** spreadLoadDidStart: canceling activity indicator stop
❌ GCDHTTPServer.swift:97:   other(NYPLAxis.NYPLRSACypher.(unknown context at $1036c9c00).DecryptionError.invalidKeyLength)
❌ undefined:1:  JavaScript: ReferenceError: Can't find variable: readium

ettore commented 4 months ago

One thing I don't understand is that if there's a DRM key error, shouldn't the spread not even begin to load? Since after all the resource won't be able to be used. But instead it seems like we are injecting the JS anyway?

mickael-menu commented 4 months ago

Are you talking about this part?

EPUBSpreadView.swift:153:   Evaluate script: spread.load([{"link":{"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","properties":{"encrypted":{"algorithm":"http://www.w3.org/2001/04/xmlenc#aes256-cbc","scheme":"http://readium.org/2014/01/lcp"},"id":"d_002"},"templated":false,"type":"application/xhtml+xml"},"page":"left","url":"http://localhost:56654/F40DAECA-4E7D-4EA0-8116-C52E33EC659D/OEBPS/heos_9781250765918_epub3_001_r1.xhtml"}]);

This does not inject the Readium JS layer into the resource. Instead, it executes a script within the spread view context. Since the spread view is unaware of any decoding errors in the resource, it continues as if the resource is valid.

ettore commented 4 months ago

I may have spoken incorrectly, sorry. I'm still trying to piece together all the spread view lifecycle events and moving parts in my head. But yes I was referring to that.

I guess I was trying to understand if that script should be executed even if a DRM error had happened before, since such an error (for example one stemming from a broken DRM key) will prevent any resource from being decrypted. But it sounds like it should, if I understand correctly? Because, and pls let me know if I am off, basically:

the DRM key was read from disk
we don't know if the key had issues
and therefore we think that content can be decrypted and we'll deal with errors later.

However, as I show in the (updated) reported log, after reading the DRM key from disk successfully, once the TransformingResource subclass starts to use it it immediately errors out and throws an error that is caught by GCDHTTPServer::handle(request:,completion:). So in a way we know sort of early on that we have a problem, but we let the spread view continue processing anyway. BTW I'm sort of thinking out loud here, not saying this is wrong per se.

ettore commented 4 months ago

beside the above, with the recent additions to index-reflowable.js and other JS files in my commit, I am seeing the spreadLoadStarted event callback being called. However, the half sec delay I had originally put for spinner stop cancellation seems way too long. I had to lower it to something like 1 millisec to avoid the spreadLoadStarted event from canceling the stop. But this timing seems totally arbitrary and I'm not sure it's the right way to do it?

ettore commented 4 months ago

our TransformingResource subclass transform implementation is something like this:

override func transform(_ data: ResourceResult<Data>) -> ResourceResult<Data> {
    return data.tryMap {
        // decryptWithAES() is the function throwing the error caught by GCDHTTPServer
        guard let result = try cypher.decryptWithAES($0, key: aesKey) else {
            return $0
        }
        return result
    }
}

mickael-menu commented 4 months ago

I guess I was trying to understand if that script should be executed even if a DRM error had happened before, since such an error (for example one stemming from a broken DRM key) will prevent any resource from being decrypted.

Yes, because if you have two FXL resources displayed side by side, one might work but not the other. Note that this is very specific to FXL EPUBs. With FXL, we have an HTML wrapper, so it is always properly loaded in the web view, even if the publication is broken. That might be why you didn't receive error event in the web view delegate. The FXL resources are loaded inside an iframe (one or two, depending on double spreads being enabled or not).

For reflowable though, the publication resource is loaded directly in the web view, so it might fail if the HTML is incorrect.

I don't know about your particular DRM, but I would attempt to check for the validity of the key when opening the publication, in the ContentProtection implementation. If it's broken and the resources can't be decrypted, I would return an error instead of building a Publication instance.

However what we're talking about can still be useful if the publication appears to be valid, but one or more resources are corrupted for some reason.

I had to lower it to something like 1 millisec to avoid the spreadLoadStarted event from canceling the stop. But this timing seems totally arbitrary and I'm not sure it's the right way to do it?

I'm not sure I understand, don't we want to cancel the stop to keep the spinner running when receiving spreadLoadStarted?

ettore commented 4 months ago

I'm not sure I understand, don't we want to cancel the stop to keep the spinner running when receiving spreadLoadStarted?

we do, but I think we'd want to do it only when there are no errors, right? The problem is that in case of errors the spreadLoadStarted event is still sent, so the code ends up behaving the same. I was looking for a differentiator between the error vs no-error case but it seems like the JS code sends the event regardless.

Here's my updated code: https://github.com/ettore/swift-toolkit/commits/fix/neverending-loading-state

☝️ Readium 2 Log enabled with minimum severity level of [debug].
ℹ️ Streamer.swift:98:   Open FileAsset(/Users/ettorepasquini/Library/Developer/CoreSimulator/Devices/3B3CFE90-3FEE-4680-BF46-E772D44C0D23/data/Containers/Data/Application/F63FA18C-C0F7-40F2-805F-5A00FEA47DDE/Documents/urn%3Alibrarysimplified.org%2Fterms%2Fid%2FAxis%2520360%2520ID%2F0026047662.epub)
🔎️ EPUBNavigatorViewController.swift:440:   -> on load(Optional({"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","locations":{"progression":0,"totalProgression":0.047619047619047616},"type":"application/xhtml+xml"}))
🔎️ EPUBNavigatorViewController.swift:229:   * transitioned to loading(pendingLocator: Optional({"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","locations":{"progression":0,"totalProgression":0.047619047619047616},"type":"application/xhtml+xml"}))
🔎️ EPUBSpreadView.swift:536:    **** activity indicator starting
🔎️ EPUBSpreadView.swift:456:    **** webView didFinish
🔎️ EPUBSpreadView.swift:559:    **** scheduling work item to stop activityIndicator
🔎️ EPUBSpreadView.swift:152:    Evaluate script: spread.load([{"link":{"href":"/OEBPS/heos_9781250765918_epub3_001_r1.xhtml","properties":{"encrypted":{"algorithm":"http://www.w3.org/2001/04/xmlenc#aes256-cbc","scheme":"http://readium.org/2014/01/lcp"},"id":"d_002"},"templated":false,"type":"application/xhtml+xml"},"page":"left","url":"http://localhost:51459/F1BE0B03-690D-43F3-AE49-C457503BD81B/OEBPS/heos_9781250765918_epub3_001_r1.xhtml"}]);
🔎️ EPUBSpreadView.swift:555:    **** setNeedsStop: activityIndicator stopping
🔎️ EPUBSpreadView.swift:233:    **** spreadLoadDidStart: canceling activity indicator stop
🔎️ EPUBSpreadView.swift:233:    **** spreadLoadDidStart: canceling activity indicator stop
❌ GCDHTTPServer.swift:97:   other(NYPLAxis.NYPLRSACypher.(unknown context at $10087dc38).DecryptionError.invalidKeyLength)
❌ undefined:1:  JavaScript: ReferenceError: Can't find variable: readium

ettore commented 4 months ago

I don't know about your particular DRM, but I would attempt to check for the validity of the key when opening the publication, in the ContentProtection implementation. If it's broken and the resources can't be decrypted, I would return an error instead of building a Publication instance.

This is a good point, there are some validity checks that can be done beforehand, such as checking the key length. But the key could be corrupted and have the correct length. Generally speaking, since we don't control the decryption keys (which are shipped with the EPUB by the distributor) it will be hard to determine if the key is valid before using it.

mickael-menu commented 4 months ago

Generally speaking, since we don't control the decryption keys (which are shipped with the EPUB by the distributor) it will be hard to determine if the key is valid before using it.

I would try decrypting the first resource (or part of it) to at least make sure the key is valid. It's better to fail with a protection error when opening the book than reporting the error from the navigator.

we do, but I think we'd want to do it only when there are no errors, right? The problem is that in case of errors the spreadLoadStarted event is still sent, so the code ends up behaving the same. I was looking for a differentiator between the error vs no-error case but it seems like the JS code sends the event regardless.

I see, I was confused because FXL and reflowable are two very different cases. It's challenging talking about both at the same time.

In a nutshell, the JS files you modified for FXL (fixed-page.js, index-fixed-wrapper-one.js and index-fixed-wrapper-two.js) are always executed, as they are loading before the publication resources, which are loaded inside iframes of the wrapper HTML files. What you want to modify is index-fixed.js, which is injected in the publication resources themselves. So it won't be executed if we can't decrypt the resource. It's equivalent to index-reflowable.js.

Note that in the case of a double-page spread (index-fixed-wrapper-two.js), you will get the events from the two iframes. That might have an impact on the activity indicator.

I would recommend fixing the problem first on reflowable, then FXL one page, then FXL two pages.

ettore commented 4 months ago

Thank you, that was exactly the problem. By sending the spreadLoadStart event only on index-fixed.js and index-reflowable.js I am now able to distinguish between a normal and broken situation. I tried both cases, fixed and reflowable. I believe the "fixed" case was a fixed 2 pages because I saw 2 instances of EPUBFixedSpreadView at the same time. The code in EPUBSpreadView while not thread-safe looks fine to me, because by being a UIView all those functions are always executed on the main thread. Therefore I never saw any threading issues.

I opened a tentative PR, but if there's more work to do, happy to improve it.

readium / swift-toolkit