Balanced scoring and separate selection files

Changes the scoring function to SKLearn Balanced Accuracy
Separates the selection methods into different files (baseline_selection.py and cleanlabs_selection.py)
Adds a simple random selection baseline

New Baseline Scores:

Random: 87.8%
Baseline (cross-fold): 89.3%
CleanLabs: 63%*

* This poor accuracy is because I didn't update the CleanLabs method so it returns a very unbalanced training set that leads the model to predict unknown most of the time

harvard-edge / dataperf-speech-example

Balanced scoring and separate selection files #9