mdbloice / Augmentor

Image augmentation library in Python for machine learning.
https://augmentor.readthedocs.io/en/stable
MIT License
5.06k stars 866 forks source link

Maybe mistake in documentation and little problem in DataPipeline #147

Open masyagin1998 opened 5 years ago

masyagin1998 commented 5 years ago

While I was using Your library (it is great lib!), I found two strange things. Sorry for my English.

1) In your documentation You write

# Create your new operation by inheriting from the Operation superclass:
class FoldImage(Operation):
    # Here you can accept as many custom parameters as required:
    def __init__(self, probability, num_of_folds):
        # Call the superclass's constructor (meaning you must
        # supply a probability value):
        Operation.__init__(self, probability)
        # Set your custom operation's member variables here as required:
        self.num_of_folds = num_of_folds

    # Your class must implement the perform_operation method:
    def perform_operation(self, image):
        # Start of code to perform custom image operation.
        for fold in range(self.num_of_folds):
            pass
        # End of code to perform custom image operation.

        # Return the image so that it can further processed in the pipeline:
        return image

Maybe in perform_operation should be images not image, cause it expects list of PIL images and I understood it only after my script failed.

2) Your DataPipeline.sample method uses random for outputing images and if I put for example 5 images and got 5, it is possible that I will get 5 augmented versions of one image, not augmented versions of all 5 original images, so maybe it is better to use non-random indexes?

mdbloice commented 5 years ago

Hi @masyagin1998, glad you like the library :-) Yes that's of course right, it accepts a list of images, this is an artifact from a previous version of Augmentor where all the operations accepted only one image at a time, I haven't managed to catch all the documentation when I made the change it seems. Thanks for pointing it out, I'll fix it. As for point two, yeah in the normal Pipeline class you can call process() (instead of sample()) which does exactly what you ask, but this method is currently not implemented for the DataPipeline. I will fix that in an upcoming version.