ontoinsights / deep_narrative_analysis

Ontologies and Python code to create a semantic and ML infrastructure to enable deeper exploration and understanding of narratives
Other
20 stars 2 forks source link

Deep Narrative Analysis (DNA)

Updated 31 May 2024

License

Creative Commons

Attribution 4.0 International

CC BY 4.0

Overview

Deep Narrative Analysis (DNA) is a toolset for analyzing narratives (biographical/autobiographical text, news articles, posts in Facebook and public online forums, etc.). Currently, it is focused on analyzing news articles to aid readers of news (in understanding how the reporting is "tuned") and to investigate/discover potential mis- and dis-information “flags” across texts. DNA combines semantic, ontological, and natural language and AI/ML technologies to 1) create knowledge graphs encoding the details of the news articles with background/contextual knowledge from online and structured data sources, and 2) perform inference, reasoning and statistical and network analyses.

For more detailed information on DNA, see the presentation, Populating Knowledge Graphs.

To view the DNA ontology, see the dna-ontologies web page.

File Structure

The semantics (ontologies) and processing are captured in the directories of this project. The following folder structure is used:

The original, "proof-of-concept" DNA codebase (based on analyzing Holocaust narratives) was archived with the tag, v0.1.0-poc, in July 2022. The second version (using WordNet to include synonyms and do multi-language processing) was _archived with the tag, v0.5.0-preChat, in March 2023. The current code is refactored to obtain news articles using the news.org API, better capture semantics using OpenAI APIs, and enable more automated NL and ML analyses. In addition, the ontologies have been simplified and updated. Note that the ontology definitions still include WordNet noun and verb synset ids.

Environment and Execution

DNA has been developed and tested in a Python 3.11 environment.

Necessary libraries are specified in the requirements.txt file in the main directory. Please download (e.g. via pip install) all the requirements, and follow the remainder of the instructions in this section to get other, necessary components. (Note that you may need to update Xcode on Mac, install a Rust compiler, etc., if errors occur while doing the pip installs. If errors are reported, pip typically explains how to address them.) Lastly, set the environment variables BEFORE starting the DNA application or doing any testing.

These environment variables need to be set for the DNA application:

Other components that must be installed or set up are:

Make sure that you always upgrade/download the current spacy model ("en_core_web_trf") when upgrading spacy itself.

Lastly, to run the DNA services, cd to the dna directory and execute "flask run". The RESTful DNA APIs will be accessible at http://127.0.0.1:5000/dna/v1/repositories (local only). A log of DNA information and error messages is available in the dna directory in the file, dna.log.

Multilingual Support

Only the English language is currently supported and tested. Support for multi-lingual text would be possible using a translation tool or an LLM to create the English rendering.

This is an area which should be further researched.

Frequently Asked Questions