postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.35k stars 436 forks source link

Hero image on postlight.com/insights pages #706

Closed scottwittrock closed 1 year ago

scottwittrock commented 1 year ago

Expected Behavior

The hero image should show up in the reader view.

Current Behavior

The hero image does not show up.

Steps to Reproduce

  1. Go to this page https://postlight.com/insights/reintroducing-postlight-reader
  2. Enable the reader to view
  3. Notice the hero image is not shown

Detailed Description

The hero images should appear in the reader view.

Possible Solution

Enable the hero image in the reader view.

austinmbrown commented 1 year ago

I've looked into this a bit. Here's an update in case anyone's following along.

The parser does see the hero images on Insights and it emits it to the reader as lead_image_url. The reader, however, doesn't make use of lead_image_url so my first thought was to just implement it somewhere in the ArticleHeader. I've got a working version of this locally. Might could use some buy-in/tweaking from product or design.

Another thing I want to look into, though, is if the hero image should get picked up with the images in the article's content and displayed that way. With my local code, I hopped over to a few other news sites to make sure I wasn't causing any new headaches. In some cases, the lead image was getting duplicated: once in the header and once in the content. So it seems that the parser sometimes sees an image as both a lead image and as included in the body. I want to understand this better because it may lead to an easier fix in the parser that doesn't involve markup changes to the reader.

austinmbrown commented 1 year ago

Ok yeah, found a cleaner fix, I think - adjusting the selectors for Postlight.com's custom extractor. PR coming shortly.

johnholdun commented 1 year ago

Awesome! Thank you for investigating!