gotson / komga

Media server for comics/mangas/BDs/magazines/eBooks with API, OPDS and Kobo Sync support
https://komga.org
MIT License
4.16k stars 247 forks source link

EPUB Support: EPUBs falsely "identified" as ZIP due to directory structure inside #1321

Closed TheRealC0unt closed 1 year ago

TheRealC0unt commented 1 year ago

Steps to reproduce

EPUBs, which do not have their content.opf file placed in the main directory but in OEBPS\content.opf instead, are handled as erroneous in spite of the fact that the META-INF\container.xml correctly points to that file:

`<?xml version="1.0" encoding="UTF-8"?>

`

Expected behavior

Files with this directory structure should be handled properly.

Actual behavior

File is handled as broken, logfile produces an entry with the following pattern: Epub file detected as zip, file is probably broken: /volume1/Books/Light Novels/filename.epub

Logs

2023-11-30T20:03:43.731+01:00 INFO 15775 --- [taskProcessor-15] o.g.komga.application.tasks.TaskHandler : Executing task: HashBook(bookId='0EC6D08E0X7A6', priority='0') 2023-11-30T20:03:44.370+01:00 INFO 15775 --- [taskProcessor-13] o.g.komga.domain.service.BookAnalyzer : Trying to analyze book: Book(name=Bookname, url=file:/volume1/Books/Light%20Novels/Seriesname/Bookname.epub, fileLastModified=2023-11-30T19:56:38.298, fileSize=15249868, fileHash=, number=6, id=0EC6D08E4XFZD, seriesId=0EC6D08E0X7A4, libraryId=0EC6D04GWX8VH, deletedDate=null, oneshot=false, createdDate=2023-11-30T20:00:56, lastModifiedDate=2023-11-30T20:01:07.128) 2023-11-30T20:03:44.374+01:00 INFO 15775 --- [taskProcessor-13] o.g.komga.domain.service.BookAnalyzer : Detected media type: application/zip 2023-11-30T20:03:44.375+01:00 WARN 15775 --- [taskProcessor-13] o.g.komga.domain.service.BookAnalyzer : Epub file detected as zip, file is probably broken: /volume1/Books/Light Novels/Seriesname/Bookname.epub 2023-11-30T20:03:44.377+01:00 INFO 15775 --- [taskProcessor-12] o.g.k.d.service.SeriesMetadataLifecycle : Library is not set to import series or collection metadata for this provider, skipping: ComicInfoProvider

Komga version

1.8.3

Operating system

Synology DSM 6.2, OpenJDK 17

Installation method

jar

Other details

No response

Acknowledgements

TheRealC0unt commented 1 year ago

Additional note: it seems that all the .epubs with this result have been created with Calibre, because almost all of them do contain a file called META-INF\calibre_bookmarks.txt. Probably there is an issue with the detection of the epub version, be it 2 or 3 (I don't know if there is a distinction here in the current implementation).

gotson commented 1 year ago

Quoting https://www.mobileread.com/forums/showthread.php?t=299415

The MIME type of an ePUB file should be application/epub+zip.

This can happen if the mimetype file:

  • Is missing.
    • Isn't the first entry.
    • Is compressed instead of stored.
    • Has extra fields.
    • Contains the wrong MIME type (seems unlikely).
TheRealC0unt commented 1 year ago

Wouldn't have opened an issue if the file was missing. ALL files contain the proper mimetype file with the proper information. If you want, I could provide you with a copy of some of the "broken" files, so you could easily reproduce the issue.

It would be good practise to add additional debug output into the log for now into everything around the .epub support, as long as it is deemed "experimental work in progress", whenever a file is estimated as "broken", to see where the false conclusion is drawn. At least that's the way how I try to close in on issues with my customers in my applications.

gotson commented 1 year ago

If it's detected as zip then it's broken. Read the post I quoted, it's all explained.