Closed ViktorKaz closed 6 years ago
splitter/subsetter might even be nicer than an iterator, since many algorithms make use of array/matrix operations. Have a look how this is currently done in the model_selection module of sklearn, for example with KFold. There's really no reason to re-invent the wheel - unless of course the wheel is not circular but polygonal, say.
This is now implemented. The train/test indices are stored in the output hdf5 in the default group '/split_dts_idx'.
split data - should not duplicate entire dataset. Create index to elements of X_train, y_train, X_test, y_test or create iterator object