This PR adds a notebook called "imbalanced-dataset.ipynb".
It has the following two experiments-
Classification (90:10 class ratio in the dataset) -
Compares the Accuracy and F1 Score for sklearn's RandomForestClassifier and DummyClassifier (in 'stratified' mode).
The accuracies are comparable but the F1 score is very poor for the DummyClassifier as compared to RandomForestClassifier.
Regression (Linear Data + Normal noise) -
Compares the mean squared error between sklearn's RandomForestRegressor and DummyRegressor (in 'mean' mode).
The MSE values are comparable despite having vastly different predictions on a per-sample level.
This PR adds a notebook called "imbalanced-dataset.ipynb". It has the following two experiments-
Classification (90:10 class ratio in the dataset) - Compares the Accuracy and F1 Score for sklearn's RandomForestClassifier and DummyClassifier (in 'stratified' mode). The accuracies are comparable but the F1 score is very poor for the DummyClassifier as compared to RandomForestClassifier.
Regression (Linear Data + Normal noise) - Compares the mean squared error between sklearn's RandomForestRegressor and DummyRegressor (in 'mean' mode). The MSE values are comparable despite having vastly different predictions on a per-sample level.