postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.41k stars 442 forks source link

img src does not use image url, causing no image being displayed (This is also seen in mercury reader) #589

Closed tong0x closed 1 year ago

tong0x commented 3 years ago

Expected Behavior

When parsing or using mercury reader to read https://li.substack.com/p/unbundling-work-from-employment, the src field of the tags should contain the URL for the images to be displayed.

Current Behavior

The src field of the parsed tags are filled with a JSON object, specifically the data-attrs field of the tag from the site's HTML, rather than the src field.

Steps to Reproduce

  1. Go to https://li.substack.com/p/unbundling-work-from-employment
  2. Run mercury parser (return as HTML) or mercury reader
  3. Find src field of tags of the parsed HTML
pbshgthm commented 2 years ago

Any updates on this? I'm running into this exact issue right now :(

johnholdun commented 1 year ago

Fixed by #696