jfilter / split-folders

🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders
MIT License
414 stars 72 forks source link

split_folders.ratio() to have "oversample" parameter. #12

Closed gohjunyi closed 2 years ago

gohjunyi commented 5 years ago

I wanted to split the images into train, val and test. split_folders.ratio('data/images', output="data/images_new", seed=1337, ratio=(.8, .1, .1)) 1) I realized that there are many copies for the classes with small samples. Would like to find out whether using the ratio method will automatically balance the data set?

2) If yes, would it also be possible to add a parameter "oversample" for ratio method?

gohjunyi commented 5 years ago

Part 1 of the issue is invalid as I have ran the code again and it generated without any performing oversampling.

jfilter commented 4 years ago

But how would you choose the number of items for train / val / test? I guess it would make sense to go over all classes and choose the ratio from the class with the most items. But this would get tricky / complicated. I am not sure if this is really helpful.

jfilter commented 2 years ago

If anybody wants to work on this, please open a new issue.