Set up a system/program to partition the training data (see #1 and #10) into k partitions so we can perform k-fold validation runs.
Inputs
k - the number of partitions to generate
n - the total size of the generated evaluation set
a directory of files containing positive examples
a directory of files containing negative examples
the output directory to write results.
Outputs
k files in the output directory, each with n/k file IDs
Notes
If n < |training set| then print a warning and generate the k partitions using the entire training set.
else select n documents at random from the entire training set.
Set up a system/program to partition the training data (see #1 and #10) into k partitions so we can perform k-fold validation runs.
Inputs
Outputs
Notes If n < |training set| then print a warning and generate the k partitions using the entire training set. else select n documents at random from the entire training set.