leotaku / kojirou

Generate perfectly formatted Kindle e-books from MangaDex manga
MIT License
104 stars 10 forks source link

Alternative decoding mechanism #37

Open kawasaki-kanagawa opened 11 months ago

kawasaki-kanagawa commented 11 months ago

I encountered a decoding error:

❯ kojirou 58be6aa6-06cb-4ca5-bd20-f1392ce451fb -l en -V 5
Title: Yotsuba&!
Author: Azuma Kiyohiko
Groups: /azu/nites, Veers
Chapters: 28, 29, 30, 31, 32, 33, 34
Volume: 5 |█████████████████████████████████████████████████████████████████████████████████| Error           |
Error: volume 5: pages: mangadex: chapter 32: image 13: decode: invalid JPEG format: Huffman table has zero length

which I believe to be a variation of what was previously discussed:

As such, I would like to propose an alternative decoding mechanism by using something else other than Go's stdlib due to its inflexibility in decoding images.

I made a proof-of-concept using libvips, though at the cost of CGO_ENABLED=1 along with vips and pkg-config dependency, and it seemed to have resolved the problem I had. You can check the implementation here: https://github.com/kawasaki-kanagawa/kojirou/commit/3f80b3f2f029398309716fbeb3e63417fb4fdba4

❯ go run . 58be6aa6-06cb-4ca5-bd20-f1392ce451fb -l en -V 5
Title: Yotsuba&!
Author: Azuma Kiyohiko
Groups: /azu/nites, Veers
Chapters: 28, 29, 30, 31, 32, 33, 34
Volume: 5 |█████████████████████████████████████████████████████████████████████████████████| 198 / 198       |

FWIW, I think a better solution is to find pure Go image decoding library so the resulting distributable binary would remain static. However, I failed to find one and resorted to libvips for this proof-of-concept.

If there is no pure Go alternative to be found, I would propose that the CGO_ENABLED=1 variant would only be shipped in containerized distributable (Docker) to minimize CGO-related errors on end user, especially if they are doing fine with Go stdlib decoder.

Let me know what you think! 😄

P.S. I've been thinking of a Manga-to-Kindle solution for a while but have not had the energy to implement one, until I stumbled upon this recently. Very glad to have found it and thank you for starting it!

leotaku commented 11 months ago

Thanks for the nice comments as well as the proof-of-concept!

FWIW, I think a better solution is to find pure Go image decoding library so the resulting distributable binary would remain static. However, I failed to find one and resorted to libvips for this proof-of-concept.

100% agree with you here, my desire to keep the binaries fully static for easy distribution is a large reason for why I have chosen to not use FFI-based image libraries so far, even though the Go image library clearly has its problems. When I find the time, I might look into building my prebuilt binaries with CGO+musl so I can keep easily distributing them.

Also, when I implement something like this, I don't think it is likely that I will be using libvips, but rather its JPEG, PNG, ... dependencies directly, as libvips size and attack surface makes me queasy considering we are decoding largely untrusted files.

kawasaki-kanagawa commented 11 months ago

Yeah that makes sense! 😄

libvips size and attack surface makes me queasy considering we are decoding largely untrusted files

That’s fair. I do think libvips has built quite a decent reputation over, say ImageMagick, since lots of popular libraries (e.g. Node.js sharp) and frameworks (e.g. Ruby on Rails) are built with it. It’s bigger than direct JPG / PNG, but not terrible IMO :) I'm certainly more concerned with end-user getting issues with FFI if they're not on a system like Nix.

kawasaki-kanagawa commented 11 months ago

but not terrible IMO :)

Maybe that statement wasn't so true 😅 for benchmark purposes, I built a container for kojirou with libvips using nix and it's 352MB 😬. I will post back with more information if I can find some time to try it out in pure alpine or maybe even implement libjpeg / libpng / libgif directly. Attaching the nix build and Dockerfile for reference.

kojirou.nix ```nix # kojirou.nix { pkgs ? import { } , lib ? pkgs.lib , stdenv ? pkgs.stdenv , ... }: let name = "kojirou"; in pkgs.buildGoModule rec { nativeBuildInputs = with pkgs; [ vips pkg-config validatePkgConfig ]; buildInputs = [ pkgs.vips ]; go = pkgs.go_1_21; pname = name; version = "0.3.1-vips"; src = pkgs.fetchFromGitHub { owner = "kawasaki-kanagawa"; repo = "kojirou"; rev = "0808920ff465739c83b8fe2b1a46de30cd67ad9d"; sha256 = "sha256-qcQu5+wvH1afkkjjLjlCnPWmhiIp0NxaD02eXmRgF9w="; }; vendorHash = "sha256-baMGyp8X2xirOuDHv9sRWQuYeW7spAGCefXSVJe0V8A="; CGO_ENABLED = 1; proxyVendor = true; meta = { pkgConfigModules = [ "vips" ]; }; } ```
Dockerfile ```Dockerfile FROM nixos/nix:latest AS builder RUN nix-channel --update COPY kojirou.nix /opt/build/ WORKDIR /opt/build RUN nix-build ./kojirou.nix \ && mkdir /opt/nix-store-closure \ && cp -R $(nix-store -qR result/) /opt/nix-store-closure FROM scratch WORKDIR /app COPY --from=builder /tmp /tmp COPY --from=builder /opt/nix-store-closure /nix/store COPY --from=builder /opt/build/result /app ENTRYPOINT ["/app/bin/run"] ```