probstlukas / gpt3-dementia-detection

Predicting Dementia Based on Speech Using GPT-3 Text Embeddings
MIT License
2 stars 1 forks source link
data-science deep-learning dementia-detection embeddings gpt3 llm machine-learning nlp openai python whisper


Logo

Predicting Dementia Based on Speech Using GPT-3 Text Embeddings

Dementia is the loss of cognitive function, which affects how a person can use language and communicate. Early detection and intervention are crucial for improving the quality of life for affected individuals. This project aims to harness the power of natural language processing and deep learning to diagnose dementia at an early stage based on speech patterns. By utilizing GPT-3 text embeddings, we seek to develop an innovative tool that can assist in the early detection of dementia.
Explore the docs »

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Results
  5. License
  6. Acknowledgments

About The Project

This project was created during my position as a research assistant at TECO research group at Karlsruhe Institute of Technology (KIT). It involves several steps:

Built With

Python

(back to top)

Getting Started

To get a local copy up and running follow these steps.

Prerequisites

To gain access to the data used in this project, you must first join as a DementiaBank member.

For simplicity, we used the dataset provided for the ADReSSo-challenge which has been balanced with respect to age and gender in order to eliminate potential confunding and bias.

In our case, the downloaded ADReSSo audio files had an incompatible format to transcribe them with Whisper. Therefore, we first had to format them with ffmepg. Since we cannot reformat and replace the files at the same time, we have to save them temporarily and then replace the old files:

find . -name '*.wav' -exec sh -c 'mkdir -p fix && ffmpeg -i "$0" "fix/$(basename "$0")"' {} \;

Now replace the original audio files with the formatted files in the fix folder, which can then be deleted.

Installation

  1. Get an OpenAI API Key at https://platform.openai.com/account/api-keys.
  2. Set an environment variable 'OPENAI_API_KEY' (replace ~/.zshrc with ~/.bashrc if you use Bash):
    echo "export OPENAI\_API\_KEY='your key'" | cat >> ~/.zshrc
  3. Clone the repo
    git clone https://github.com/probstlukas/gpt3-dementia-detection.git
  4. Install required Python packages
    pip install -r requirements.txt

(back to top)

Usage

Config

Before running the program, make sure that the default settings (logging level, embedding engine, directory paths, ...) in config.py suit you. Then start the program by running main.py.

Transcription

CLI-init

If the data has not yet been transcribed, confirm with yes, select your preferred Whisper model and wait until the transcription process is complete.

Embedding

CLI-embedding

The embeddings are created separately for training and test data. train_embeddings.csv also contains the MMSE score and the diagnosis label for each audio file.

Remark

It is not necessary to scale the embeddings before using them. They are already normalized and are in the vector space with a certain distribution.

Classification

CLI-classify

In this step, machine learning models are trained and evaluated using the provided embeddings. For comparison with other, more complex classifiers, we create a dummy classifier that makes predictions that ignore the input features. This gives us a baseline performance (like flipping a coin, i.e. about 50%).

We use three different models: Support Vector Machine, Logistic Regression, and Random Forest.

The classification process can be divided into two parts:

Model checking

plot_accuracy_LR

Model building

All processed data and results are stored in the configured directories specified in config.py.

(back to top)

Results

These results were obtained with the following configuration:

Model Size
SVC 1441162 B
LR 13007 B
RF 82947 B
Total 1537116 B
Set Model Accuracy Precision Recall F1
Train SVC 0.779 (0.084) 0.839 (0.048) 0.779 (0.084) 0.772 (0.084)
Train LR 0.81 (0.085) 0.847 (0.07) 0.81 (0.085) 0.8 (0.091)
Train RF 0.804 (0.052) 0.839 (0.044) 0.804 (0.052) 0.793 (0.053)
Test SVC 0.779 0.839 0.779 0.772
Test LR 0.81 0.847 0.81 0.8
Test RF 0.804 0.839 0.804 0.793
Test Dummy 0.425

Our results show that GPT-3 text embeddings can be used to reliably distinguish individuals with Alzheimer's disease from healthy individuals from the control group, just by analyzing their speech behavior.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Acknowledgments

(back to top)