senny / sablon

Ruby Document Template Processor based on docx templates and Mail Merge fields.
MIT License
447 stars 128 forks source link

p is not a valid child element of div #184

Closed SebRollen closed 2 years ago

SebRollen commented 2 years ago

Ran into this issue today, where sablon would not convert html where a p tag was a child element of a div tag. This was surprising to me, but I can see that the config file indeed says that div tags can only hold inline elements: https://github.com/senny/sablon/blob/master/lib/sablon/configuration/configuration.rb#L63

This seems inconsistent with the HTML standard which lists that any flow content (which includes p) can be a child of div tags: Div: https://html.spec.whatwg.org/#the-div-element

SebRollen commented 2 years ago

I tried updating the line to also allow _block elements as children of div tags. This let me generate the word doc, but I ran into corruption issues when opening the document. Would appreciate some pointers to understand why this would not work

senny commented 2 years ago

Hi @SebRollen, it's been a long while since I worked on Sablon. I don't have any direct pointers for you where to look but I can tell you that Sablon by no means tries to implement complete compliance with the HTML standard. HTML insertion is a compromise and contains lots of trade-offs. It is certainly not in a state where it could render any valid HTML you give it.

Without having looked closer, I'd assume that block elements can not be nested in WordML. As both div and p tags translate to a wordML block, this results in the corruption you observe.

I would recommend that you transform the html before you feed it to Sablon.

stadelmanma commented 2 years ago

@SebRollen that is expected behavior, WordML has more rules and is less forgiving than HTML so some concessions needed to be made to allow mapping from HTML -> WordML. I suppose one could tweak the code to treat any child of a block tag as an inline tag then you'd be allowed to do <div><p>[content]</p></div> and it'd render as if it was <div><span>[content]</span><div>. I haven't looked at the code in ages but I suspect there might be a few other "gotchas" adding in that flexibility.

SebRollen commented 2 years ago

@senny @stadelmanma Thank you both, I think I was focusing too much on the HTML side of things rather than the WordML standard - just because something is allowed from one side obviously doesn't mean there's a direct mapping to the other standard.

We've found a way to work around this issue by tweaking our HTML slightly, so I'll close the issue as this doesn't necessarily seem like something that could or should be fixed in Sablon.

Thanks again!