Skewed datasets are not uncommon. And they are tough to handle. Usual classification models and techniques often fail miserably when presented with such a problem. We discuss right from the basics of what class imbalance means to how we can overcome it, using various algorithms and some subtle techniques. We also discuss details of how to evaluate our efforts and some small but crucial things that must be taken care of.
Duration
30 min
Audience
The talk does require beginner to intermediate machine learning knowledge. However, the overall learning of the talk would still be understandable to someone who has never explicitly practiced machine learning before.
Outline
The talk has the following sections-
What is Class Imbalance?
Here we give examples to define what a class imbalanced dataset means and why it should be handled differently.
Ways to overcome it -
We go in detail about 3 ways to tackle the class imbalance problem.
a.Sampling
b.Setting Hyperparameters to assign weights
c.Libraries like imblearn
Evaluation Methods
We discuss the evaluation methods that best help us judge how our model is performing on an imbalanced dataset.
Custom loss
We discuss a custom loss function that can considerably better our deep learning model and also explain why it does so.
Misc
We go over some miscellaneous tricks and steps we can take to avoid common pitfalls.
a.Train - Validation Splits
b.Remove classes
Additional notes
My name is Aditya Lahiri and I am currently a Machine Learning intern at American Express, Big Data Labs. I am a Computer Science undergraduate from BITS Pilani, Goa and will graduate in December 2019. I love solving problems through data and code. Besides that, I enjoy attending meetups, talks and try my best to contribute to them. I have previously given talks in my college at events like Google Developers Group, Goa.
Title
Description
Duration
30 min
Audience
Outline
We discuss a custom loss function that can considerably better our deep learning model and also explain why it does so.
Additional notes
Here are the slides of this proposal- https://docs.google.com/presentation/d/1_hiJQsbXHhrzlXxCtPUSpt9-FvMWNlNw1m6cBVPyGCE/edit?usp=sharing