Diabetes Prediction using Support Vector Machines

Inyrkz commented 3 years ago

Be sure to visit our Resources Page for tools, resources, and example articles to go over.

NOTE: We tend to stray away or tend not to publish reviews/comparisons of commercial product offerings.

Proposed title of the article

Your title should be descriptive of the article/tutorial. Be Specific. Use keyword research to gain your article a higher ranking.

Diabetes Prediction using Support Vector Machines

Introduction paragraph(s):

Please write the Introductory paragraph(s), that would be included in your article. We will use this writing snippet to help us assess overall quality before approval. We're looking for the first 2-3 paragraphs of the article that appropriately summarize what your article will be about. Take your time and write the content as you would intend to get it published.

In this article, you will learn how to diagnose if a patient has diabetes based on his medical records. We will use the Support Vector Machine (SVM) Algorithm from Sci-kit Learn to build our Machine Learning model. After reading this article, you will be able to solve any classification problem using the support vector machine algorithm from sklearn.

Our dataset has 768 rows and 9 columns. We separate the features and the target variable. We have 8 features which are: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, and age. Our target variable is the outcome column, and 1 represents patients with diabetes, while 0 represents patients without diabetes.

Sci-kit learn has 4 kernels for SVM, these are linear, poly, rbf, and sigmoid. Different kernels work better on different datasets. We don’t know which of these kernels will give us a better decision boundary. We will iterate through the kernels and see which one gives us the best decision boundary. The decision boundary is the line that separates the positive classes from the negative classes. It could be linear or non-linear. We will fit the SVM model for each kernel to our training set, and predict on our training set as well as the test set to see which kernel will give us the highest accuracy score. This process is known as Hyper-Parameter Optimization.

Key takeaways:

What are the 3-5 most important things the reader should understand or be able to do after reading this article? Use this area to get your ideas down on the bulk of your article or tutorial.

After reading this article, the reader should be able to:

import dataset and do data preprocessing,
split the dataset into training and test set,
Deal with imbalanced dataset
build an SVM model using the linear, poly, RBF, and sigmoid kernel (Hyper-Parameter Optimization),
make a diagnosis on a new patient & assess the model's performance using accuracy score, precision, recall, and f1-score.

References:

Please list links to any published content/research that you intend to use to support/guide this article.

Coursera Guided Project https://www.coursera.org/projects/medical-diagnosis-support-vector-machines
Sklearn documentation https://scikit-learn.org/stable/modules/svm.html#svm-kernels

Templates to use as guides

ninjaginja commented 3 years ago

Thank you for the topic suggestion @Inyrkz. Could you please describe how your content will differ from the coursera project that you referenced?

Inyrkz commented 3 years ago

Hi, I'm glad you asked. The article will also cover some vital exploratory data analysis, where readers will learn how to visualize the distribution of the target variables. Readers will also learn how to generate a profile report of the dataset using pandas profiling. It will give a thorough description of any dataset they are working with. They will be able to view samples of the dataset, visualize missing values, visualize the correlations and interactions of features.

hectorkambow commented 3 years ago

👍 Sounds great - looking forward to reading it.

@Inyrkz

Topic approved

section-engineering-education / engineering-education