williamspiro / hubXml

A tool to turn any blog into a HubSpot importable XML file
MIT License
14 stars 5 forks source link

Unwrap post body elements? #48

Open williamspiro opened 5 years ago

williamspiro commented 5 years ago

Right now, the wrapping element of post body content ends up in imported post bodies. This is because the selected post body element is never unwrapped in any manner.

Should we unwrap the post body content so the wrapper does not end up in imported post bodies? Right now we import:

<div class=".post-body">
  <p>Dragon Warrior</p>
  <p>Dumplings</p>
<div>

My question is should we be importing the following into the post body?

<p>Dragon Warrior</p>
<p>Dumplings</p>

I think yes?

williamspiro commented 5 years ago

There seems to be a bug in the .unwrap() method: https://jsoup.org/apidocs/org/jsoup/select/Elements.html#unwrap--

This method works, however, if the node you are unwrapping is the outermost node in your Element, it actually just leaves the selected node to remove and removes all of its children, effectively doing exactly the opposite of what we want - leaving post body wrapper and deleting all of its children 😕