This PR adds experiments for the following strategies.
1.1 Fixed train-test split
hyperparameters take their default values
1.2 Random train-test splits
70:30 ratio; average accuracy is slightly better than 1.1
hyperparameters take their default values
1.3 K Fold Cross-Validation
k=5; make k different Train-Test splits which together guarantee that each point is tested exactly once
hyperparameters take their default values
micro and macro-averaged accuracy values are slightly better than 1.1 and slightly lesser than 1.2
2.1 Fixed Train-Test Split (hyperparameters tuned on Validation set)
2.1.1 Validation Set as fixed Subset of Training Set
Train-Validation-Test split in a 50:20:30 ratio
hyperparameters are tuned on 1 fixed validation set.
after tuning, model is trained with optimal hyperparameters on Train+Val and tested on Test.
accuracy is better than 1.1, 1.2, 1.3.
2.1.2 Multiple random subsets of Training Set used as Validation Set
num_subsets = 5
same process as 2.1.1 is repeated for each subset.
{Hyperparams' values, Subset Number, Validation Accuracy} values are stored in a dataframe.
Rows are grouped by the Hyperparams' values. The set of hyperparam values which have highest average validation accuracy are chosen as optimal hyperparameters for this Train-Validation-Test split.
accuracy is better than 1.1, 1.2, 1.3, 2.1.1.
2.2 Nested Cross-Validation
5 outer folds, 5 inner folds.
Pick an outer fold, we pick an inner fold (Train-Validation) and iterate through a grid of hyperparameter values computing the validation accuracies. Average the validation accuracies for each {Hyperparameter Values} set across the inner folds, and pick that {Hyperparameter Values} set which gives highest average validation accuracy. Train a model with these hyperparameter values on this outer fold's Train and report accuracy on this outer fold's Test.
Repeat this for each outer fold.
We get similar test accuracies for each outer fold- better than 1.1, 1.2, 1.3; comparable to 2.1.1 and 2.1.2
This PR adds experiments for the following strategies.
1.1 Fixed train-test split
1.2 Random train-test splits
1.3 K Fold Cross-Validation
2.1 Fixed Train-Test Split (hyperparameters tuned on Validation set) 2.1.1 Validation Set as fixed Subset of Training Set
2.1.2 Multiple random subsets of Training Set used as Validation Set
2.2 Nested Cross-Validation