seblucas / cops

Calibre OPDS (and HTML) PHP Server : web-based light alternative to Calibre content server / Calibre2OPDS to serve ebooks (epub, mobi, pdf, ...)
http://blog.slucas.fr/en/oss/calibre-opds-php-server
GNU General Public License v2.0
1.43k stars 229 forks source link

Epub files without an NCX table of contents throw error when opening in browser #525

Open dunxd opened 1 year ago

dunxd commented 1 year ago

When trying to open some epub files in the browser viewer, I get a blank page and the following error in the log:

[Wed Apr 19 16:47:37 2023] PHP Fatal error:  Uncaught Error: Call to a member function attr() on null in /cops/resources/php-epub-meta/lib/EPub.php:79
Stack trace:
#0 /cops/epubreader.php(27): EPub->initSpineComponent()
#1 {main}
  thrown in /cops/resources/php-epub-meta/lib/EPub.php on line 77

This is the third line in the initSpineComponent() function in EPub.php that sets the $tochref variable:

public function initSpineComponent()
    {
        $spine = $this->xpath->query('//opf:spine')->item(0);
        $tocid = $spine->getAttribute('toc');
        $tochref = $this->xpath->query('//opf:manifest/opf:item[@id="' . $tocid . '"]')->item(0)->attr('href');
        $tocpath = $this->getFullPath($tochref);
        // read epub toc
        if (!$this->zip->FileExists($tocpath)) {
            throw new Exception('Unable to find ' . $tocpath);
        }

        $data = $this->zip->FileRead($tocpath);
        $this->toc = new DOMDocument();
        $this->toc->registerNodeClass('DOMElement', 'EPubDOMElement');
        $this->toc->loadXML($data);
        $this->toc_xpath = new EPubDOMXPath($this->toc);
        $rootNamespace = $this->toc->lookupNamespaceUri($this->toc->namespaceURI);
        $this->toc_xpath->registerNamespace('x', $rootNamespace);
    }

After some trial and error I found this error is thrown when opening EPub files that do not have an NCX table of contents. When editing the table of contents of an EPub in Calibre, an NCX is created if one doesn't already exist - one doesn't need to actually make any changes to the table of contents - just click Ok after opening the Edit Table of Contents dialog.

NCX seems to be an old ToC system that Calibre creates for backwards compatibility, when it creates a ToC. The PHP EPub Meta library used hasn't been maintained in a while. Perhaps NCX was normally used at that point.

I'm investigating whether Calibre can be set up to create the NCX ToC on importing. I also found a more recent fork of PHP EPub Meta that may not have this problem.

dunxd commented 1 year ago

If Calibre is set to convert to EPub v3 files, then the NCX is missing. If it is set to convert to EPub v2 files, then the NCX is present instead of nav.xhtml.

I have not yet found a way in Calibre to convert to EPub v3 files including an NCX, which is allowed for compatibility. Editing ToC in Calibre does generate the NCX in EPub v3 files.

So there are two workarounds:

  1. Use only EPub v2 in Calibre
  2. Edit the ToC of EPub v3 files in Calibre.
mikespub commented 1 year ago

The fix above should allow you to view EPUB 3 files via the browser viewer in COPS, without needing to edit the TOC in Calibre.

That being said, the underlying "monocle" library hasn't been updated in 10 years and there's no filtering of content from the EPUB file, so there are security risks if you can't trust the origin of the ebook you want to view via browser.

See https://github.com/joseph/Monocle/wiki/EPUB-and-other-package-formats for details

dunxd commented 1 year ago

Thanks, I tried raising the issue on Mobile Read. I wish Calibre would just fix their conversion of EPubs to add the NCX ToC when automatically converting to ePub v3 instead of making it a manual only option. I'd prefer this to happen. Till then I'm going to stick with ePub v2.

dunxd commented 1 year ago

Actually, I take that back as I didn't understand your comment or what you linked to first time around.

What Monocle's wiki is saying is that using ePub v3 in general is a security risk compared to v2 - this has nothing to do with the NCX issue but more to do with ePub v3 allowing JavaScript that could be malicious. I guess that could be a worry if obtaining ePub files from the darkweb, although it could also be FUD from 10 years ago.

The fix you made allows COPS' web reader to open ePub v3 files in the browser, and hopefully browsers that support javascript today are at no more risk from Javascript in ePubs than any other web page they open.

In other words - thanks for fixing this!!!

mikespub commented 1 year ago

Included in release 1.3.4 at https://github.com/mikespub-org/seblucas-cops