Closed gohjunyi closed 2 years ago
Part 1 of the issue is invalid as I have ran the code again and it generated without any performing oversampling.
But how would you choose the number of items for train / val / test? I guess it would make sense to go over all classes and choose the ratio from the class with the most items. But this would get tricky / complicated. I am not sure if this is really helpful.
If anybody wants to work on this, please open a new issue.
I wanted to split the images into train, val and test.
split_folders.ratio('data/images', output="data/images_new", seed=1337, ratio=(.8, .1, .1))
1) I realized that there are many copies for the classes with small samples. Would like to find out whether using the ratio method will automatically balance the data set?2) If yes, would it also be possible to add a parameter "oversample" for ratio method?