split_folders.ratio() to have "oversample" parameter.

jfilter / split-folders

🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders

MIT License

414 stars 72 forks source link

split_folders.ratio() to have "oversample" parameter. #12

Closed gohjunyi closed 2 years ago

gohjunyi commented 5 years ago

I wanted to split the images into train, val and test. split_folders.ratio('data/images', output="data/images_new", seed=1337, ratio=(.8, .1, .1)) ~~1) I realized that there are many copies for the classes with small samples. Would like to find out whether using the ratio method will automatically balance the data set?~~

2) If yes, would it also be possible to add a parameter "oversample" for ratio method?

gohjunyi commented 5 years ago

Part 1 of the issue is invalid as I have ran the code again and it generated without any performing oversampling.

jfilter commented 4 years ago

But how would you choose the number of items for train / val / test? I guess it would make sense to go over all classes and choose the ratio from the class with the most items. But this would get tricky / complicated. I am not sure if this is really helpful.

jfilter commented 2 years ago

If anybody wants to work on this, please open a new issue.