postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.41k stars 442 forks source link

Drop Cap Characters Missing when Parsed #555

Closed michaelkirkpatrick closed 1 year ago

michaelkirkpatrick commented 4 years ago

Expected Behavior

When loading Mercury Reader for an article, drop cap characters are included in the parsed page.

Current Behavior

When loading Mercury Reader for an article, characters rendered in drop cap are omitted.

Steps to Reproduce

  1. Load this Medium article in Chrome
  2. Note that in the opening paragraph the letter "T" is in drop cap and the sentence reads "The calls started early today"
Screen Shot 2020-04-27 at 11 02 15 AM
  1. When using the Mercury Reader extension, the drop cap "T" is omitted.
Screen Shot 2020-04-27 at 11 02 07 AM

Detailed Description

The HTML on the page is as follows for that portion: <p id="7a70" class="if is ap ce ih b eq ii it es ij iu ik il fd im in fe io ip ff iq ir dq" data-selectable-paragraph=""><span class="r iv iw ix iy iz ja jb jc jd da">T</span>he calls started early today...</p>. My hypothesis is that Reader View is omitting text within a <span> element at the start of a <p>.

johnholdun commented 1 year ago

Fixed by #696