kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
154 stars 24 forks source link

Added image embedding support for epub #84

Open IdanDor opened 1 year ago

IdanDor commented 1 year ago

Specifically, added image_selector for arbitrary sites that allows selecting img tags from chapters, downloading them and embedding them within the resulting epub.

In the case of Pale, this means that the character banners and extra materials do not require an internet connection to view.

Also made the two pale.json's more consistent (pale.json now correctly includes the title of the chapters).

kemayo commented 1 year ago

Looks like a good basis for something I've been meaning to do for ages (see: #2 existing). I'm tempted to merge this as-is, and later tinker with it further to pull it closer into the core.

I'm curious, though -- what's the motivation for having an explicit selector for images, as opposed to just selecting every img that's contained in the extracted chapter?

IdanDor commented 1 year ago

Just like you wrote in the linked issue, I thought it should something one can somehow disable. And the selector simply matches in my mind what the codebase does with every other "choice".

I do not have other positive/negative places to use this, I simply wanted this for Pale. So I do not have somewhere where a selector is better than simply selecting everything.

IdanDor commented 1 year ago

Also, I should mention, to make this work, you might need to cleanup attribtues of the image (like I'm deleting srcset which was interfering). So if there are somehow multiple types of images in the same epub, you will need a more robust cleanup process for them. Maybe also a whitelist for image attributes should be added inside the code? (might be over engineering to add it as a json parameter).