mdbloice / Augmentor

Image augmentation library in Python for machine learning.
https://augmentor.readthedocs.io/en/stable
MIT License
5.08k stars 866 forks source link

Create n augmented images for each image #160

Open paulfauthmayer opened 5 years ago

paulfauthmayer commented 5 years ago

With a given dataset, I would like to produce n augmented images for each image in the specified folder and save them in the output folder.

If I do something like this:

dataset_size = len( os.listdir( '/path/to/dataset' ) )     # dataset_size == 100
n = 3
p.sample( n * dataset_size )

It picks 300 random images from the dataset and creates the augmented images. However, this results in a disproportionate dataset with some images being processed more than n times and some not being processed at all. I would prefer to do the random picking at a later point in training and not while generating/ augmenting the dataset.

I guess that I could also do this with the following code, but that's not the prettiest way to do it.

import itertools

n = 3
for _ in itertools.repeat(None, n):
    p.process()    # or p.sample(0) for that matter

Am I missing something, is there a nicer way to do this? If not, I think a way to tell the sample function to not pick an image at random would be appreciated.

Thanks!

mdbloice commented 5 years ago

Hi @paulfauthmayer yeah, that's a nice idea, I hadn't thought of that. I will add that to the process( ) function in the next update. Right now though I think what you're doing is as good a workaround as could be done, even if it is not very pretty as you say :-)

Zhang-O commented 5 years ago

@mdbloice, has you add this functionality to process() function? I can not wait to try it .

Zhang-O commented 5 years ago

@paulfauthmayer, thanks for your code !

Zhang-O commented 5 years ago

@paulfauthmayer, thanks for your code !