kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
154 stars 24 forks source link

Readding support for The wandering inn #95

Closed Shadow-Master closed 4 months ago

Shadow-Master commented 1 year ago

The site update of https://wanderinginn.com/table-of-contents/ broke the json example for it. Would it be possible to get an updated version of it?

TheMetalCenter commented 1 year ago

This one based on the next chapter links still works. Just set the starting chapter at the top.

The website is still being actively developed, so it may be a good idea to wait before trying to fix the TOC based json.

{ "url": "https://wanderinginn.com/2022/06/03/9-00/", "title": "Wandering Inn Volume 9 Alternate-test", "author": "pirateaba", "content_selector": "#main", "content_title_selector": "h1.entry-title", "content_text_selector": ".entry-content", "filter_selector": "a[href='wanderinginn.com'], a[href='wanderinginn.wordpress.com'], a[href*='wanderinginn.files.wordpress.com']", "next_selector": "a[rel=\"next\"]" }

Shadow-Master commented 1 year ago

Thanks, I will give it a shot. I frankensteined a fix by copying all hrefs into an old HTML copy of the previous table of contents I had, serving it from a local python webserver and pointing the JSON at it and it works perfectly. I would need to update it for each new chapter, though. Yours seems cleaner, though and also worked perfectly. An interesting behavior is that it grabs the last, password protected, chapter also.

I will wait until the site finishes updates before hoping for a new clean JSON, thanks.

I changed your last line in case people are looking at these comments, so as to include the backslashes unescaped by github, and added one, in case you wanted a nice cover.

"next_selector": "a[rel=\"next\"]", "cover_url": "https://wanderinginn.files.wordpress.com/2018/08/wandering_inn_final_2.png"

Shadow-Master commented 1 year ago

Hmmm, processing seems to have failed on Google Play books with both generated epubs...

TheMetalCenter commented 1 year ago

Hmmm, processing seems to have failed on Google Play books with both generated epubs...

That's strange. They work fine on my kindle, but I do convert to azw3 with calibre. Perhaps try converting epub to epub with calibre, you could also try another e-reader app in case the issue is with google play books.

Shadow-Master commented 1 year ago

I will try. I like Google books because its the easiest way to get to my phone.

EDIT: It failed on Google Books with epub2epub

HersheyTaichou commented 11 months ago

Thank you for this! It worked for exporting the book, but I noticed it doesn't include some of the formatting, for example, in chapter 2.38, some of the text is colored. Do any of you know a way to include the colored text? Specifically the HTML is for those sections. Thank you!

kemayo commented 4 months ago

@HersheyTaichou the colored text gets stripped as part of the general ebook cleanup (because most attempts to color text play poorly when transferred to an ereader in black-and-white). However, you can request this not be done by running the script with --no-strip-colors as an option.