Authors: Juan R. Loaiza (URosario) and Miguel González Duque (ITU Copenhagen)
In this repository we track progress on a research project in which we apply text mining to philosophy journals in Latin America. Our aim is to provide insights into the history of philosophy in Latin America using a data-driven approach.
We started with Ideas y Valores (Colombia) and articles from 2009 to 2017. We are now expanding the corpus from Ideas y Valores to cover all articles since the journal's foundation in 1951. We plan on expanding later to include more years and other journals such as Crítica (Mexico) and Análisis Filosófico (Argentina).
TODO: This structure is now outdated.
.
├── data # Data files (omitted from Git repository for the moment)
| ├── corpus # Parsed JSON files after preprocessing.
| ├── rawHTML # Raw HTML files directly as scraped with metadata.
| ├── rawPDF # Raw PDF files directly as scraped with metadata.
| ├── parsedHTML # Parsed HTML using Article class (see utils).
| └── parsedPDF # Parsed PDF files to produce common JSON files.
├── extras # Extra notebooks with additional processes or figures.
├── notebooks # Notebooks with preprocessing and analyses.
| ├── models # LDA Models we have used.
| └── wordlists # Stopwords and protected words lists
├── utils # Helper utilities
└── README.md
Note: This suggests that word extension has not changed significantly since the journal's foundation in 1951. This contradicts a common intuition that philosophy is moving towards shorter articles.
The following plots are only proofs of concept. We are using a temporary LDA model with 10 topics to find which visualizations would work best. There is still work to fully optmize the LDA model though. We use a model with the following top 10 most salient words.
Topic 0 | Topic 1 | Topic 2 | Topic 3 | Topic 4 | Topic 5 | Topic 6 | Topic 7 | Topic 8 | Topic 9 |
---|---|---|---|---|---|---|---|---|---|
lenguaje | kant | religioso | ser | creencia | ser | político | acción | alma | político |
interpretación | concienciar | religión | cuerpo | mundo | mundo | formar | moral | ser | derecho |
teoría | ser | ciudad | formar | ser | hegel | vida | ser | platón | moral |
experiencia | concepto | filosofía | heidegger | teoría | filosofía | ser | accionar | filosofía | ser |
wittgenstein | objetar | historia | modo | propiedad | dios | filosofía | agente | conocimiento | justicia |
filosofía | experiencia | siglo | aristóteles | término | bien | nietzsche | personar | sócrates | bien |
ser | arte | cultura | ente | contener | vida | foucault | desear | hombre | social |
problema | husserl | tradición | naturaleza | concepto | razón | social | intención | virtud | sociedad |
autor | trascendental | ciencia | bien | físico | hombre | crítico | bien | bien | teoría |
filosófico | modo | obrar | existencia | objeto | pensar | pensamiento | libertar | obrar | razón |
These plots still use the year range from 2009 to 2017. We will expand on these soon when we implement the LDA model on the whole corpus.