Scrape local US sources from Local News website

The local news initiative website has a robust list of news sources in states across the US, including county data (recently updated and republished). It'd be helpful for us to have that list to potentially create county-level collections, even though there aren't any URLs. Since they have an API-backed service, and no batch download, it might be relatively easy to scrape the data.

Sample URL: https://www.northwesternlni.com:8068/lni/localnewstable?state=MA&county=Hampshire&year=2024 Sample JSON:

[
  {
    "id": 44612,
    "state": "MA",
    "county": "Hampshire",
    "mediaName": "Amherst Bulletin",
    "mediaType": "Newspaper",
    "yearLoaded": "2024"
  },
  {
    "id": 44613,
    "state": "MA",
    "county": "Hampshire",
    "mediaName": "Daily Hampshire Gazette",
    "mediaType": "Newspaper",
    "yearLoaded": "2024"
  },
  {
    "id": 44614,
    "state": "MA",
    "county": "Hampshire",
    "mediaName": "Valley Advocate",
    "mediaType": "Newspaper",
    "yearLoaded": "2024"
  },
...

The task here would be to built a scraper in a Jupyter notebook perhaps that pulls all the data into a CSV. Then we can review and decide what we might want to do with it.

mediacloud / directory-issues

Scrape local US sources from Local News website #3