jfilter / split-folders

🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders
MIT License
414 stars 72 forks source link

Assertion Error #5

Closed v-prgmr closed 5 years ago

v-prgmr commented 5 years ago

I get this error when i use the fixed attribute. Error:

AssertionError Traceback (most recent call last)

in ----> 1 split_folders.fixed('C:/Users/TEXVNQA/Downloads/TL_DATA/', output="C:/Users/TEXVNQA/Downloads/GTS_Torch Format/", seed=1337, fixed=(135,135), oversample=False) # default values c:\torch\lib\site-packages\split_folders\split.py in fixed(input, output, seed, fixed, oversample) 69 lens = [] 70 for class_dir in dirs: ---> 71 lens.append(split_class_dir_fixed(class_dir, output, fixed, seed)) 72 73 if not oversample: c:\torch\lib\site-packages\split_folders\split.py in split_class_dir_fixed(class_dir, output, fixed, seed) 105 files = setup_files(class_dir, seed) 106 --> 107 assert len(files) > sum(fixed) 108 109 split_train = len(files) - sum(fixed) AssertionError:
jfilter commented 5 years ago

How many items to you have as input?

v-prgmr commented 5 years ago

I have 43 classes. And the classes are unballanced. The total number of images would be 1200.

jfilter commented 5 years ago

So I assume for some classes, there are fewer than 135 samples, right? Can you try to turn oversampling on?

jfilter commented 5 years ago

Just let me know if this was the cause for your problem. I agree that the error message was not useful and I could improve it.

doursand commented 5 years ago

hello , first of all thanks for the great job, the tool is awesome :-) I am experiencing the same issue with my unbalanced dataset. If I set the fixed param to a value above the maximum amount of files for one of the class i have the assertion error , and this even if i enable the oversample parameter.

jmtzt commented 5 years ago

I am having the same issue, when I try to use

import split_folders

split_folders.fixed('/content/Data', output = "output", seed = 1337, fixed = (275, 275), oversample = True)

AssertionError Traceback (most recent call last)

in () 1 import split_folders 2 ----> 3 split_folders.fixed('/content/Data', output = "output", seed = 1337, fixed = (275, 275), oversample = True) 1 frames /usr/local/lib/python3.6/dist-packages/split_folders/split.py in split_class_dir_fixed(class_dir, output, fixed, seed) 105 files = setup_files(class_dir, seed) 106 --> 107 assert len(files) > sum(fixed) 108 109 split_train = len(files) - sum(fixed) AssertionError: I have 28 folders which are my classes and the max amount of images in one class is 275, that's why the fixed of (275, 275). What should I do?
jfilter commented 5 years ago

Okay people, I improved the error message. Use a ratio instead of fixed if the number of samples are to few.

The problem is that I don't want to encourage over-sampling for the validation and test set. So if only 10 files exists and you want to use fixed with, i.e., (8, 8), it's impossible. So I raise an error. The error message was not very helpful, though.