The local news initiative website has a robust list of news sources in states across the US, including county data (recently updated and republished). It'd be helpful for us to have that list to potentially create county-level collections, even though there aren't any URLs. Since they have an API-backed service, and no batch download, it might be relatively easy to scrape the data.
The task here would be to built a scraper in a Jupyter notebook perhaps that pulls all the data into a CSV. Then we can review and decide what we might want to do with it.
@m453h - This would be super helpful to have for the directory health team. Some goal deliverables would be:
A json dump replicating LNI's county/state news collections
A comparison against our local news collections in directory.mediacloud.org- Firstly whether we index a given local news site or not, and then secondly if it belongs to the appropriate collection. The second step should be automatable with the Directory api.
The local news initiative website has a robust list of news sources in states across the US, including county data (recently updated and republished). It'd be helpful for us to have that list to potentially create county-level collections, even though there aren't any URLs. Since they have an API-backed service, and no batch download, it might be relatively easy to scrape the data.
Sample URL: https://www.northwesternlni.com:8068/lni/localnewstable?state=MA&county=Hampshire&year=2024 Sample JSON:
The task here would be to built a scraper in a Jupyter notebook perhaps that pulls all the data into a CSV. Then we can review and decide what we might want to do with it.