montera34 / pageonex

PageOneX. Analyzing front pages
http://pageonex.com
GNU Affero General Public License v3.0
54 stars 13 forks source link

Epic Story: Automated front page coding by topic #198

Open schock opened 8 years ago

schock commented 8 years ago

(Future User Story): I'm a PageOneX user, and I want to simply type in a phrase or keyword, choose my newspapers and date range, and see an automatically generated PageOneX visualization of front page coverage.

Notes:

numeroteca commented 6 years ago

This approach, using PDF extracted information looks promising: https://github.com/samzhang111/frontpages/ You have first to set up the script to daily download front pages from the Newseum.

@samzhang111 has also worked with some python libraries to detect spatial boundaries in PDFs