jfroelich / rss-reader

A simple Chrome extension for viewing RSS feeds
Other
10 stars 0 forks source link

Incorrect whitespace collapse situation for small emphasis #787

Open jfroelich opened 5 years ago

jfroelich commented 5 years ago

url that caused the problem: https://themillions.com/2019/03/ten-ways-to-look-at-the-color-black.html

This is the view-source raw html:

Physicist <strong>Brian Greene</strong> explains in <a href="http://www.amazon.com/exec/obidos/ASIN/039333810X/ref=nosim/themillpw-20"><em>The Elegant Universe</a> </em>that Schwarzschild’s calculations...

This is the outerHTML copied from the dom inspector:

Physicist <strong>Brian Greene</strong> explains in <a href="http://www.amazon.com/exec/obidos/ASIN/039333810X/ref=nosim/themillpw-20" class="amz-ext text-only" data-slimstat="5"><em>The Elegant Universe</em></a><em> </em>that Schwarzschild’s calculations implied objects whose “resulting space-time warp is so radical that <em>anything</em>, including light, that gets too close…</p>

In this input, the space is dropped after the e in The Elegant Universe. In the output the link is abutting the next word with no adjacent space. It looks like this (copied from inspector of view):

Physicist <b>Brian Greene</b> explains in <a href="http://www.amazon.com/exec/obidos/ASIN/039333810X/ref=nosim/themillpw-20" rel="noreferrer"><i>The Elegant Universe</i></a>that Schwarzschild’s calculations...

So, at first it looks like the old unwrap issue, but upon closer inspection I think the <em>space</em> following it is the culprit. So somewhere in the filter like the emphasis filter or the leaf filter or the text node filter, it should be doing a transformation to a single space, not removal.

Also note the difference between the raw source and the inspector view. The parser generated a DOM that broke up the em element into two em elements to fix the sloppiness of the input.

What I should be able to do is create a test that accesses the filter stuff and reproduces the issue. This may require some refactoring of the dom-filters module(s). From there I should be able to isolate what exactly causes that extra space to be dropped, and decide how to avoid that.

jfroelich commented 5 years ago

note that the example was not the sole example of the problem in the input document. most likely the author used some wysiwig tool, highlighted the text, clicked emphasize, and the editor naively just inserted em tags around the selection without regard to crossing link element boundary.