Re-export a comic with WEBP images, changes the images to JPEG

BrokenRSA commented 1 month ago

Describe the bug When you have a comic that contains WEBP pages, and you export it again, selecting WEBP as the format, the WEBP pages gets converted to JPEG pages although the file extension remains WEBP.

Exact Steps to Reproduce Steps to reproduce the behavior:

Select a comic that has JPEG pages
Right Click on a Comic
Select Export Books/Export Books
Scroll down Page Format and select WEBP as format.
The pages are not WEBP format.
Repeat Steps 2 to 4 again, and the pages will be converted to JPEG, although the filenames remain the same.

Ideal outcome, when you re-export and the files are already WEBP, it should just not do anything with the files. At the moment it converts them to JPEG, but keeps the extension as WEBP. Subsequen re-exports with WEBP format just keeps the same JPEG file, so the same should really apply for WEBP. I suspect that what happens with WEPB will happen with the DJVU image format as well, but I haven't tried that as such.

Screenshots

Test CBZ with JPEG image:
Test CBZ after conversion to WEBP:
Test CBZ after re-conversion with WEBP as format:

Version/Commit (check the about page, next to the version, for the string between brackets):

Version: 0.9.180
Commit: 927a7a1

Additional context I've traced the issue down to this specific function in the ComicRackCE\ComicRack.Engine\IO\Provider\ImageProvider.cs file. It converts WEBP images to JPEG, stores the JPEG data then. I haven't had time to really get my head around the code yet to see how to fix it, if I ever get time in the near future I'll give it a stab.

Test Files.zip

maforget commented 1 month ago

I did notice that reconverting to WEBP from WEBP would result in a bigger image, but never noticed that it would change it to jpg. That would explain it.

As a matter of fact I did check that part of the code yesterday, it converts to jpeg because down the line when in loads the byte array to a Bitmap it uses .NET Image.FromStream which only supports very limited types. So it looks like internally all images (webp & djvu) are jpeg.

It takes the byte array it extracted from the archive and goes through some steps and finally it uses this Extension method to covert from a stream to a Bitmap here:

https://github.com/maforget/ComicRackCE/blob/927a7a1f15eefb42204a96a4d1a5b48dccefc198/cYo.Common/Drawing/BitmapExtensions.cs#L202

So I would check when it does the Exporting, my guess without looking at the code is that it sees it's a .webp image and sees that the result is also one and leaves it like that not taking into consideration that it converted it to jpeg to be able to load it to an Image object.

That would mean that it would do the same form DjVu also.

P.S. you can copy a permalink from lines of code from the code on github and paste the link into a comment or issue and it will paste only that part of the code, instead of talking a screenshot.

BrokenRSA commented 1 month ago

I figured there's probably a couple of places this might be impacted. When I have some free time I'll try and dig a bit deeper into the code and see what would work best. I know where it's happening and why, just need to understand what's happening after this to make sure we're not breaking anything down the line.

This specifically is where it converts the WEBP to JPEG, and it does some extra stuff if the type is JPEG, so could possibly just check the pagetype first, and if it's WEBP of DJVU, don't convert it to JPEG, and then the rest should work still as is, but will have to test.

https://github.com/maforget/ComicRackCE/blob/927a7a1f15eefb42204a96a4d1a5b48dccefc198/ComicRack.Engine/IO/Provider/StorageProvider.cs#L201

maforget commented 1 month ago

That provider.GetByteImage reads that image from that previous part you linked to. It also does that when reading a comic. It converts to a jpeg so that conversion to Bitmap/Image is possible. We can't prevent that conversion, it is used in a lot of place.

The problem in that area that you linked to, seems like if there are no changes. It seems to detect the filetype is etiher original (preserve format) or the setting based on the extension setting.PageType == GetStoragePageTypeFromExtension(ext). Determines that it's a webp and saves that array directly without taking into account that it was converted.

Way to fix it might be (just thinking aloud:

Check the filetype using WebpImage.IsWebp if the ext is .webp to really determine the filetype. And in that case that the input or output is webp, always convert. Or easier always convert when webp is concerned.
Only for the export, add an optional parameter to GetByteImage & RetrieveSourceByteImage that would return the original raw array without conversion.

I haven't debug that function, but on top of my head that would be the easiest to fix.

Edit: Looking at it point 2 is probably the easiest.

BrokenRSA commented 1 month ago

Agreed, option 2 is probably going to be the best way to handle this, I'll have a look during the week and see about it.

maforget / ComicRackCE

Re-export a comic with WEBP images, changes the images to JPEG #74