kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
154 stars 24 forks source link

(invalid html) Stray ampersand in chapter title #56

Closed ClaasJG closed 3 years ago

ClaasJG commented 3 years ago

Hello,

I could not open the chapter Interlude: Lost & Found of 'A Practical Guide to Evil' Book 6 using a Toline Shine 3 (This is also reproducible using the chrome Extension EPUBReader) because, it contains invalid html. I am using leech and the example json from d50f23d07b854d8dfd50a0a1de92bf051c275a1b .

The title of the chapter contains a '&' symbol. This 'stray ambiguous ampersand' is the cause for the error and should be replaced with &. See the following html taken from the downloaded ePub:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
   <title>Interlude: Lost & Found</title>
   <link rel="stylesheet" type="text/css" href="../Styles/base.css" />
</head>
<body>
   <h1>Interlude: Lost & Found</h1>
   <div class="entry-content">
   <blockquote class="wp-block-quote">
   [...]
</body>

I tried to fix this by escaping the title using html.escape at ebook_init_.py # chapter_html @ Line 93.

chapters.append((
    title,
    f'{story.id}/chapter{i + 1}.html',
    html_template.format(title=html.escape(title), text=contents)
))

I do not generally escape the chapter title, because otherwise it will be displayed as Interlude: Lost &amp; Found in the index.

This quick and dirty fix allows the Toline Shine 3 to display the chapter, Be aware, that a stray '&' withing the content would produces a similar error.

Thanks for leech and have a nice day

-ClaasJG

kemayo commented 3 years ago

Interesting, I'm kind of surprised that it's taken so long for this to turn up as an issue.

ClaasJG commented 3 years ago

Good evening,

thanks for the fix. I don't know if you expected me to close this issue or if you left it open as remainder that invalid content html is still a problem. I consider my problem fixed and this issue resolved and just assume I may close it.

Have a nice day

-ClaasJG

kemayo commented 3 years ago

Sure, go for it. I was leaving it open as a reminder to myself that I should make the template HTML more robust, but making a dedicated ticket for that would be a more-correct action. :D