grantgasser / Alzheimers-Prediction

An attempt to diagnose Alzheimer's disease earlier
60 stars 30 forks source link
adni alzheimer-disease-prediction alzheimers-disease classification neural-network ordinal-regression

Alzheimer's Diagnosis

Data Science Project by Grant Gasser under advisement of Dr. Joshua Patrick at Baylor University.

ADNI Picture

Table of Contents

  1. Summary of Alzheimer's
  2. Project Motivation
  3. Data Set
  4. Prediction Models, Data Set

Summary of Alzheimer's Disease

Imaging

Alzheimer's disease (AD) is a progressive neurodegenerative disease. Though best known for its role in declining memory function, symptoms also include: difficulty thinking and reasoning, making judgements and decisions, and planning and performing familiar tasks. It may also cause alterations in personality and behavior. The cause of AD is not well understood. There is thought to be a significant hereditary component. For example, a variation of the APOE gene, APOE e4, increases risk of Alzheimer's disease. Pathologically, AD is associated with amyloid beta plaques and neurofibrillary tangles.

Diagnosis

Onset of the disease is slow and early symptoms are often dismissed as normal signs of aging. A diagnosis is typically given based on history of illness, cognitive tests, medical imaging, and blood tests.

Treatment

There is no medication that stops or reverses the progression of AD. There are two types of drugs that attempt to treat the cognitive symptoms:

These medications can slightly slow down the progression of the disease.

Prevention

It is thought that frequent mental and physical exercise may reduce risk.


Project Motivation

The Alzheimer's Association estimates nearly 6 million Americans suffer from the disease and it is the 6th leading cause of death in the US. The estimated cost of AD was $277 billion in the US in 2018. The association estimates that early and accurate diagnoses could save up to $7.9 trillion in medical and care costs over the next few decades.

Sources: Mayo Clinic, Alzheimer's Association, Wikipedia

Project Description:

Using data provided by the ADNI Project, it is our goal to develop a computer model that assists in the diagnosis of the disease. We will try multiple models recently popularized in machine learning (Neural Network, SVM, etc.) and more traditional statistical models such as ordinal regression, multinomial regression, and decision trees.


Data Set: ADNI Q3

To simplify the problem, we collapse CN and LMCI into the same category of "Not AD" (aka "Not Alzheimer's").

Important Note: The models using this data set assume the physician diagnoses (the labels) are correct.


Predictive Models

Solution v3 (2023): Binary Classification with XGBoost

Results:

Test Accuracy: 90.48%
Precision: 0.67
Recall: 0.87
F1: 0.75
AUC: 0.89

Initial confusion matrix (before lowering threshold 0.5 -> 0.1 to increase Recall):

Confusion Matrix

Feature Importances: (from prior models):

Feature Importances

Unsurpisingly, cognitive test scores (MMSE) and age (AGE) are the most predictive of Alzheimer's.

Solution v2 (2021): Multi-Class Prediction in Python (Jupyter Notebook)

Results: 74% Test Accuracy

Solution v1 (2019): Ordinal Regression (Ranking Learning) in R (CN < LMCI < AD)

Results: 70% Test Accuracy (110/157)

Proposed Solution: Only predict CN if P(CN) > some threshold instead of predicting max(P(CN), P(LMCI), P(AD)). This should reduce the amount of CN predictions and thus, reduce the amount of False Negatives.