Borderlands newspaper data mining
This repository hosts Jupyter Notebooks introducing text data mining with
Python on the newspaper collection. The work is part two projects:
- Using Newspapers as Data for Collaborative Pedagogy: A Multidisciplinary
Interrogation of the Borderlands in Undergraduate Classrooms, funded in part
by the Mellon Foundation through the
Collections as Data program.
More information about the project is available found at
https://libguides.library.arizona.edu/newspapers-as-data.
- Reporting on Race and Ethnicity in the Borderlands (1882-1924): A
Data-Driven Digital Storytelling Hub, funded by the Mellon Foundation through
the Digital Borderlands
program.
If you are looking for an introduction explaining the concept of text data
mining, check out the StoryMap at https://arcg.is/1j84jz.
The scripts responsible for downloading and assembling daily volumes are
available in a separate repository, at
https://github.com/jcoliver/borderlands-newspapers.
The work focuses on the following titles:
- Arizona Citizen, one of Arizona's earliest newspapers, published in Tucson
- Arizona Post, a Tucson newspaper by and for the Jewish community
- Arizona Sun, an African American newspaper published in Phoenix
- Apache Sentinel, published by African American soldiers stationed at Fort
Huachuca
- Bisbee Daily Review, a newspaper published in Bisbee, a mining town at that
time
- Border Vidette, a newspaper published in Nogales, Arizona, on the border
with Nogales, Mexico
- Phoenix Tribune, the first African American newspaper published in Arizona
- El Fronterizo, a weekly Tucson Spanish-language paper
- El Mosquito, a Tucson paper including local news and news from Mexico
- El Sol, a Spanish-language, Mexican American newspaper published in Phoenix
- El Tucsonense, a Spanish-language, Mexican American newspaper published in
Tucson
- The Daily Morning Oasis, a daily English paper from Nogales, Arizona
- The Oasis, an English-language paper published in Nogales, Arizona
- The Weekly Orb, a weekly paper from Bisbee, Arizona
- Tucson Citizen, a continuation of the Tucson newspaper, Arizona Citizen
The text for most of these newspapers is available at
Chronicling America.
Downloads of the texts used the API, documented at
https://chroniclingamerica.loc.gov/about/api/.
The entire data set is available from the UArizona Research Data Repository at
https://doi.org/10.25422/azu.data.12735992.v3.
Text data mining lessons
Lessons for using these data in text data mining are available in Jupyter
Notebooks. All lessons are licensed under a
CC-BY-4.0 license 2020
by Jeffrey C. Oliver. Translation for the Spanish version of the Text Mining
Template was aided in part by the Python script by Fernando Marcos Wittmann,
available at https://github.com/WittmannF/jupyter-translate.
Name |
Launch |
Description |
Introduction to text mining (short) |
|
A brief lesson introducing relative word frequencies and visual display of word use over time. Includes a subset of the titles (three) for the three year period 1917-1919. |
Introduction to text mining (long) |
|
An extended version of the short lesson, above. Time to complete the lesson is approximately two hours |
Text mining template |
|
A relatively lightweight notebook to explore text mining analyses on the full data set of 15 titles. |
Plantilla de Minería de Texto |
|
Un cuaderno relativamente liviano para explorar análisis de minería de texto en el conjunto completo de datos de 15 títulos. (BORRADOR) |