mawa00006 / Doping-Detection-Based-on-Publicly-Available-Competition-Data-in-Professional-Road-Cycling

0 stars 0 forks source link

Create simple classification model #23

Closed sama25100 closed 2 years ago

sama25100 commented 2 years ago
sama25100 commented 2 years ago

The Results of the Logistic Regression model when trained on the whole labeled-2 dataset: Accuracy: 0.9432484224332166 Precision: 0.5189030883919062 Recall: 0.08707501228610999 Conufsion Matrix: [367711, 1807], [ 20434, 1949] Script: https://colab.research.google.com/drive/1jYTO59Bw_DcSWlaztOOfJ0lrFEa_An16?usp=sharing

sama25100 commented 2 years ago

The result after sampling the dataset so that there is an equal amount of sinners and not sinners in the training set Confusion Matrix: [[7012 2412] [ 142 434]] Accuracy: 0.7446 Precision: 0.15249472944483486 Recall: 0.7534722222222222

Remark: when the number of sinners and non-sinners is equal in the test set the precision jumps to > 70%. Probably overfitted.

sama25100 commented 2 years ago

Used PCA to visualize the data in its current form (one row contains one performance) pca_bad Explained Variance: PC1: 0.29440868 PC2: 0.13864443