zairza-cetb / HAM10000

This project aims at developing models for skin cancer classification and to further develop an architecture for lossless Segmentation of cancerous part.
0 stars 0 forks source link
classification hacktoberfest hacktoberfest-accepted hacktoberfest2022 machine-learning neural-network segmentation skin-cancer

logo

HAM10000: Skin Cancer MNIST

PythonNumPyPandasscikit-learnKerasTensorFlow

Table of Contents

About the Project

It aims at building machine learning based model which will predict the type of skin cancer one has, as well as segment the part of the skin which is infected or cancerous in nature.

Introduction

Skin problems are really frequent in nature hence easily ignored but sometimes they are fatal in nature and become cancerous in nature and goes undiagnosed leading to permanent disfigurement and even death. Skin cancer, the abnormal growth of skin cells, most often develops on skin exposed to the sun. But this common form of cancer can also ocur on aeas of your skin not ordinarily exposed to sunlight.

There are various types of skin cancers in the realm of pigmented lesions are:

They are structurally really similar and causes simple irritation with mild to severe pain. We aim at developing models which can be used by different platforms which will takes images of the skin and predict if the growth in skin is cancerous or healthy, and also classify the type of cancer it is.

Progress so far

Data

The Data that we have used here is like the MNIST for Skin cancer classification. It contains dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes.

Data Pre-processing

The data comprises of Images and the CSV files represent the labels in an one-hot encoded manner with the respective image names to match with. As ImageDataGenerator's flow from directory can't accept one hot encoded labels, a label column with labels annoted to them was created.

The labels are :

The samples distribution is as follows:

NV       6384
MEL      1053
BKL      1035
BCC       488
AKIEC     309
VASC      138
DF        107

For simplification the data is undersampled to 300 samples or less.

Building Model

As two classes, VASC and DF are having samples less than 300, so we are gonna initialize the weights to compensate the imbalance.

        Class              Samples   Weight  
        AKIEC               300.0    1.00000 
         BCC                300.0    1.00000 
         BKL                300.0    1.00000 
          DF                115.0    2.60870 
         MEL                300.0    1.00000 
          NV                300.0    1.00000 
         VASC               142.0    2.11268 

For starters we have used three methods to check the accuracy one can get with undersampled images. They are

Result Analysis

EfficientNet B0

Accuracy = 3%

VGG16

Accuracy = 57%

Future Work