under sampling - Githubissues

No, I did it on purpose. In machine learning, we assume that the data we had collected so far are generated by an unknown underlying distribution. To assess our model generalisation capability, one of the possible ways to split train and test set is to maintain their label distribution, in this case, stratified sampling.

Under sampling is just one of model training tricks, and should only be operated on training set.

Therefore, yes, the number of training data could be less than test data after under sampling.

munhouiani / Deep-Packet

under sampling #2