Closed kevinemoore closed 4 years ago
Great! It could also be useful to specify k-fold cross-validation splits.
This diagram shows a fairly standard training-validation-test split:
and this diagram shows k-fold cross-validation splits:
If k-fold cross-validation is used, typically the "training" and "validation" splits from the first diagram are combined into just the "training" split (leaving only the "training" and "test" splits), and the "training" split is then divided into the k-fold cross-validation splits.
Quilt 3, combined with PyTorch or TF APIs, now allows direct interaction with file system and provides primitives for arbitrary file organization.
For data packages used in machine learning, it would be useful for Quilt build to support splitting inputs into fixed sets for model training, and validation. For structured data, the various sets (training, test and validation) could be children of a common parent so that the entire dataset is available (by calling _data on the parent). Thanks to @rhiever for the suggestion.