palewire / news-homepages

An open-source archive that gathers, saves, shares and analyzes news homepages
https://homepages.news
GNU General Public License v3.0
126 stars 16 forks source link

Enhancement? Link y-position #487

Closed jeremybmerrill closed 1 month ago

jeremybmerrill commented 1 month ago

Hey Ben: Curious if you'd take a PR to add a column in the links sheet for each link's y-position?

It'd probably be slightly more surgery than ideal, since this bit of link-extraction would probably have to be re-written in JavaScript and executed via Playwright (which I've never done via Playwright but which I have done via Chromedriver and it isn't too weird), since we'd need the page to actually have been rendered (versus the HTML merely parsed by BeautifulSoup).

Edit: The rationale for this would be that users could you y-position to estimate prominence and whether something is "above the fold", one screen down, or ranked in the depths where non-journalists dare not scroll.

palewire commented 1 month ago

I'm open to an add. My main ask is that we maintain backwards compatibility with the data we've already collected.

It never got merged, but we did have an idea long these lines submitted a while ago over here: https://github.com/palewire/news-homepages/pull/372

jeremybmerrill commented 1 month ago

My proposal would just be to add a new k/v pair in the JSON, e.g.

    "text": "Space Flight Without the Rocket Fuel? A Florida-Based Company Is Aiming for the Stars.5 min read",
    "url": "https://www.barrons.com/articles/space-flight-without-the-rocket-fuel-a-florida-based-company-is-aiming-for-the-stars-2867f491?mod=hp_minor_pos32",
    "y": 12345
  }

Totally possible I'm underestimating the complexity of this, though!