Debug the bot detection pipeline

Since when this issue has been raised, the bot detection pipeline has been debugged and improved with new features, a more powerful model with tuned parameters, ...etc and we are now able to reach acceptable performance even though there is still room for improvements, i.e. The latest notebook is uploaded to neptune, and lives in the cluster.

XGBoost: train classification report
              precision    recall  f1-score   support

       False       0.89      1.00      0.94    124117
        True       0.99      0.73      0.84     57258

    accuracy                           0.91    181375
   macro avg       0.94      0.86      0.89    181375
weighted avg       0.92      0.91      0.91    181375

XGBoost: validation classification report
              precision    recall  f1-score   support

       False       0.89      1.00      0.94     41373
        True       0.99      0.73      0.84     19086

    accuracy                           0.91     60459
   macro avg       0.94      0.86      0.89     60459
weighted avg       0.92      0.91      0.91     60459

src-d / identity-matching

Debug the bot detection pipeline #65