jellyfin / jellyfin-plugin-bookshelf

https://jellyfin.org
MIT License
195 stars 20 forks source link

Cover image not extracted from EPUB files #100

Open godvino opened 4 months ago

godvino commented 4 months ago

Bookshelf fails to extract the cover image from some books in the EPUB format. The name of the book as well as the description and other details gets loaded correctly though.

One of the books that cause this issue is https://www.feedbooks.com/book/1421/the-adventures-of-sherlock-holmes

Going into Jellyfin's metadata folder after the book is imported, I can see a poster.jpg file created that is not an image actually.

$ cat poster.jpg 
<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE html PUBLIC
     "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

  <head>
   <title>Cover</title>
   <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
  </head>

  <body>
    <div style="text-align: center; page-break-after: always;">
      <img src="images/cover.png" alt="Cover" style="height: 100%; max-width: 100%;" />
    </div>
  </body>

</html>

Screenshot from 2024-02-27 22-39-51

unfedorg commented 2 months ago

I have 754 epub files under Jellyfin management and only 5 of them have cover image correctly extracted, so I have had a look on this issue and figured that there are 3 things to be improved.

  1. Treat calibre:series_index as Double 1st thing I notice was that there are below error messages in the jellyfin server log on every metadata refresh attempt.

    [2024-04-14 11:01:41.673 +08:00] [ERR] Error converting to int32
    System.FormatException: Input string was not in a correct format.
    at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
    at Jellyfin.Plugin.Bookshelf.Providers.OpfReader`1.ReadInt32AttributeInto(String xPath, Action`1 commitResult)

    There is a place where the plugin takes calibre:series_index and convert it to Int32 but it's failing because calibre:series_index often has decimal. (e.g. "1.0") Taking it as Double before convert to Int32 should solve for most cases. https://github.com/jellyfin/jellyfin-plugin-bookshelf/blob/5baaa87b6e7c5176be5cffb61d748e081b832f15/Jellyfin.Plugin.Bookshelf/Providers/OpfReader.cs#L149

  2. Accept empty opfRootDirectory

There is a code to check opfRootDirectory and if it's empty or null, it gives up to extract image. However it's common that image file is placed at the root. Just accept empty string would solve this issue. This actually solved on 93% of my epub files.

https://github.com/jellyfin/jellyfin-plugin-bookshelf/blob/5baaa87b6e7c5176be5cffb61d748e081b832f15/Jellyfin.Plugin.Bookshelf/Providers/Epub/EpubMetadataImageProvider.cs#L104-L108

  1. Improve xPath

Image extraction will fail to find a cover image if the epub file has an another object with "cover" id. This can be improved by limiting only to the object with "image/" media-type.

Also I have some epub files that id for the cover image is "my-cover-image" instead of just a "cover-image". I'm not sure if this is a common case but adding some wildcard may improve the chance of extracting the cover image.

https://github.com/jellyfin/jellyfin-plugin-bookshelf/blob/5baaa87b6e7c5176be5cffb61d748e081b832f15/Jellyfin.Plugin.Bookshelf/Providers/OpfReader.cs#L57-L67

With above changes, I was able to extract 100% of my epub files. Also tested with the https://www.feedbooks.com/book/1421/the-adventures-of-sherlock-holmes epub file and it's working fine.

I will try to make a pull-request for these changes.

Thanks!