jcoliver / dig-coll-borderlands

Repository for text data mining borderlands newspapers
MIT License
8 stars 6 forks source link

Borderlands newspaper data mining

This repository hosts Jupyter Notebooks introducing text data mining with Python on the newspaper collection. The work is part two projects:

If you are looking for an introduction explaining the concept of text data mining, check out the StoryMap at https://arcg.is/1j84jz.

The scripts responsible for downloading and assembling daily volumes are available in a separate repository, at https://github.com/jcoliver/borderlands-newspapers.

The work focuses on the following titles:

The text for most of these newspapers is available at Chronicling America. Downloads of the texts used the API, documented at https://chroniclingamerica.loc.gov/about/api/. The entire data set is available from the UArizona Research Data Repository at https://doi.org/10.25422/azu.data.12735992.v3.

Text data mining lessons

Lessons for using these data in text data mining are available in Jupyter Notebooks. All lessons are licensed under a CC-BY-4.0 license 2020 by Jeffrey C. Oliver. Translation for the Spanish version of the Text Mining Template was aided in part by the Python script by Fernando Marcos Wittmann, available at https://github.com/WittmannF/jupyter-translate.

Name Launch Description
Introduction to text mining (short) Binder A brief lesson introducing relative word frequencies and visual display of word use over time. Includes a subset of the titles (three) for the three year period 1917-1919.
Introduction to text mining (long) Binder An extended version of the short lesson, above. Time to complete the lesson is approximately two hours
Text mining template Binder A relatively lightweight notebook to explore text mining analyses on the full data set of 15 titles.
Plantilla de Minería de Texto Binder Un cuaderno relativamente liviano para explorar análisis de minería de texto en el conjunto completo de datos de 15 títulos. (BORRADOR)