Closed karthikkunala closed 4 years ago
The split routine I used will randomly split. I haven't checked to see if it is resulting in the same distribution of samples, and that is something we need to be careful about.
Is this not something that happens with cross-validation?
Is this not something that happens with cross-validation?
Yep. Right now we are just doing a simple hold out and that is what @karthikkunala was referring to.
We should probably do a stratified k-fold CV, we just havent got there yet :)
I was coming across sklearn.model_selection.StratifiedShuffleSplit, do we need to shuffle records before splitting into train, test or validation. I have seen some examples where they are checking the mean of the target(if continuous) to check how is the variation differed?