puzzle / prawn-markup

Parse simple HTML markup to include in Prawn PDFs
MIT License
65 stars 16 forks source link

Bug with consecutive line break tags (<br/>) between different tags #33

Closed allthesignals closed 1 year ago

allthesignals commented 2 years ago

This markup behaves unexpectedly:

<i>hello</i><br/><br/><br/><p>world</p>

Producing:

hello world

I was expecting it to produce:

hello

world

Happy to take a look but want to confirm that this is actually a bug and that I'm not misinterpreting something :)

allthesignals commented 2 years ago

It seems like the presence of any block tags (paragraph <p>, division div) short-circuits the effect of any neighboring line break <br/> tags:

<p>hello</p><br/><br/><br/><br/><br/><br/><br/><b>world</b>

image

Regular text tags seem to be fine:

<i>hello</i><br/><br/><br/><br/><br/><br/><br/><b>world</b>

image
allthesignals commented 2 years ago

Sibling <span> tags seem to be okay, though :)

codez commented 2 years ago

Thank you for the report. I guess this comes from the use-case when there is only one <br/> before or after a <p>. In this case, it is desired that the br is ignored. So n-1 br could be kept.

Are you interested in working in a pull request?

allthesignals commented 2 years ago

@codez yup, more than happy to take a look! It might be a while.

I guess this comes from the use-case when there is only one
before or after a

. In this case, it is desired that the br is ignored. So n-1 br could be kept.

This makes sense. I'll try to update the test to reflect this more exactly...

goulvench commented 2 years ago

I'm having the same issue when rendering Trix-generated HTML, and switching from <div>s to <p>s isn't an option because <ul> and <ol> are not allowed inside paragraphs.

I tried replacing <br><br> with <br>\n<br>, <br>&nbsp;<br> but the normalizer very efficiently discards every variation I could think of.

What worked was replacing "<br><br>" with "</div><p><br></p><div>" or simply "<p><br></p>" before feeding HTML to prawn-markup, but this workaround will break if my input starts containing paragraphs.

I'd be happy to contribute a PR for this issue but I'm not sure where to start, and I'm quite busy at the moment anyway.

codez commented 2 years ago

I think the stripping happens here: https://github.com/puzzle/prawn-markup/blob/3a1b39bff018395c825456fe40d92c57e67601eb/lib/prawn/markup/processor/blocks.rb#L78 If there are just newlines, https://github.com/puzzle/prawn-markup/blob/3a1b39bff018395c825456fe40d92c57e67601eb/lib/prawn/markup/processor.rb#L96 might play a role as well.

codez commented 1 year ago

Closed because of inactivity