wcember / pypub

Python library to programatically create epub files
MIT License
278 stars 44 forks source link

xmlprettify can cause mangled output #18

Open fridgecow opened 3 years ago

fridgecow commented 3 years ago

During chapter loads, xmlprettify is called to format the output nicely, and in doing so it strips the text and tail attributes from elements. Unfortunately this can have the unintended consequence of producing mangled epubs from reasonable HTML.

For example, this HTML:

<div>1234 <i>5</i> 6789</div>

should produce output exactly like the input,

<div>1234 <i>5</i> 6789</div>

image

but actually looks like:

  <div>1234
  <i>5</i>6789
</div>

image

Which will be rendered differently since there's no space after the 5.

Removing the xmlprettify call from Chapter._render makes the output correct again.

clach04 commented 1 year ago

From bs4 docs, https://www.crummy.com/software/BeautifulSoup/bs4/doc/#pretty-printing, prettify() is for debugging purposes only.

Since it adds whitespace (in the form of newlines), prettify() changes the meaning of an HTML document and should not be used to reformat one. The goal of prettify() is to help you visually understand the structure of the documents you work with.

https://github.com/search?q=repo%3Awcember%2Fpypub%20%20%20prettify&type=code