nytimes / public_api_specs

The API Specs (in OpenAPI/Swagger) for the APIs available from developer.nytimes.com
http://developer.nytimes.com
Apache License 2.0
136 stars 40 forks source link

How to identify whether an article was on the front page of the online site? #55

Closed brienna closed 4 years ago

brienna commented 4 years ago

With the Archive API we get data for stories published in a given month, whereas with the Top Stories API we get data for stories currently on the front page. Is there a way to sort of combine the two, and find which stories were on the online front page on a given day?

nyt-hughmandeville commented 4 years ago

You can use the Article Search API to see what articles were on the front page in print.

print_section:"A" AND print_page:1

articlesearch.json?fq=print_section%3A%22A%22%20AND%20print_page%3A1&sort=newest&page=0

There is no NYT API to see what was on the web home page for a given day.
You could look on the Web Archive.

https://web.archive.org/web/*/https://www.nytimes.com/

brienna commented 4 years ago

Ah, ok thanks. I'm not looking for the printed front page, so I'll see what I can do.

I've checked the web archive, and it seems to only show an abbreviated version of the front page.

For example, go to https://web.archive.org/web/20200206000602/https://www.nytimes.com/ and compare it to www.nytimes.com.

There are a lot fewer headlines in the web archive. Might it have to do with the fact that if you load the latter, the rest of the page doesn't load until you scroll? It seems like the web archive takes its snapshot without scrolling, which would make sense, so these snapshots are actually missing a lot of the front page...