Closed sama25100 closed 2 years ago
The Results of the Logistic Regression model when trained on the whole labeled-2 dataset: Accuracy: 0.9432484224332166 Precision: 0.5189030883919062 Recall: 0.08707501228610999 Conufsion Matrix: [367711, 1807], [ 20434, 1949] Script: https://colab.research.google.com/drive/1jYTO59Bw_DcSWlaztOOfJ0lrFEa_An16?usp=sharing
The result after sampling the dataset so that there is an equal amount of sinners and not sinners in the training set Confusion Matrix: [[7012 2412] [ 142 434]] Accuracy: 0.7446 Precision: 0.15249472944483486 Recall: 0.7534722222222222
Remark: when the number of sinners and non-sinners is equal in the test set the precision jumps to > 70%. Probably overfitted.
Used PCA to visualize the data in its current form (one row contains one performance) Explained Variance: PC1: 0.29440868 PC2: 0.13864443