Open jfroelich opened 5 years ago
note that the example was not the sole example of the problem in the input document. most likely the author used some wysiwig tool, highlighted the text, clicked emphasize, and the editor naively just inserted em tags around the selection without regard to crossing link element boundary.
url that caused the problem: https://themillions.com/2019/03/ten-ways-to-look-at-the-color-black.html
This is the view-source raw html:
This is the outerHTML copied from the dom inspector:
In this input, the space is dropped after the e in The Elegant Universe. In the output the link is abutting the next word with no adjacent space. It looks like this (copied from inspector of view):
So, at first it looks like the old unwrap issue, but upon closer inspection I think the
<em>space</em>
following it is the culprit. So somewhere in the filter like the emphasis filter or the leaf filter or the text node filter, it should be doing a transformation to a single space, not removal.Also note the difference between the raw source and the inspector view. The parser generated a DOM that broke up the em element into two em elements to fix the sloppiness of the input.
What I should be able to do is create a test that accesses the filter stuff and reproduces the issue. This may require some refactoring of the dom-filters module(s). From there I should be able to isolate what exactly causes that extra space to be dropped, and decide how to avoid that.