kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
158 stars 24 forks source link

Add Ward to examples #37

Closed flying-sheep closed 3 years ago

flying-sheep commented 4 years ago

Ward is the sequel to Worm. It has a prologue consisting of a chatroom, where the protagonists’ email addresses are the handles. Those fictional email addresses are “protected” by real-life Cloudflare with some simplistic JS thing. Pretty simple stuff to handle from Python:

def decode_cf_email(cf_email):
    enc = bytes.fromhex(cf_email)
    return bytes([c ^ enc[0] for c in enc[1:]]).decode('utf8')

# The data that holds it is in the attribute: a.__cf_email__[data-cfemail]
TheMetalCenter commented 4 years ago

Not sure what you mean by the Cloudflare stuff, but you can get Ward (including glow worm) to download with this .json:

{ "url": "https://www.parahumans.net/table-of-contents/", "title": "Ward", "author": "Wildbow", "chapter_selector": "#main .entry-content a", "content_selector": "#main .entry-content", "filter_selector": ".sharedaddy, style, a[href*='parahumans.wordpress.com']" }

flying-sheep commented 4 years ago

Cloudflare is a web hoster that adds some features to the content it hosts. One of them seems to be that it protects e-mail addresses on the pages from spammers by obfuscating them.

Unfortunately that means that the Glow-Worm chapter of Ward looks like crap when extracted normally as e.g. Vicky’s handle is Point_Me_@_The_Sky and will be “protected”:

grafik

TheMetalCenter commented 4 years ago

Strange. The ePub I made doesn’t have this problem, Victoria’s email / handle look fine. But I made it a good while back (last chapter Black 13.4) so maybe issue cropped up after that. I’ll update my epub in a bit to see if I encounter the problem.

On Apr 24, 2020, at 9:51 AM, Philipp A. notifications@github.com wrote:

 Cloudflare is a web hoster that adds some features to the content it hosts. One of them seems to be that it protects e-mail addresses on the pages from spammers by obfuscating them.

Unfortunately that means that the Glow-Worm chapter of Ward looks like crap when extracted normally as e.g. Vicky’s handle is PointMe@_The_Sky and will be “protected”:

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

TheMetalCenter commented 4 years ago

Yes, after updating my epub I did encounter this issue.