jmaggio14 / imagepypelines

data pipeline and convienence library targeted at accelerating the development of imaging projects and research
https://www.imagepypelines.org/
MIT License
6 stars 2 forks source link

let pipelines run other pipelines as blocks #49

Closed jmaggio14 closed 4 years ago

jmaggio14 commented 5 years ago

as per @natedileas idea

jmaggio14 commented 5 years ago

special block that runs pipelines

(edit with details)

natedileas commented 5 years ago

Proposal

I propose "node" like syntax that combines the best features of the 2 classes. In this model, a Pipeline is a special subclass of Block, whose process_strategy / train_strategy simply validates and calls a number of blocks.

Current Definition

These definitions show the primary methods/ attributes that a block and a pipeline have currently. There is quite a bit of overlap,with notable exceptions in bold.

Block:
    name

    *IO Mapping* - shape, type

    train() - if implemented    

    processing methods / hooks
        before()
        process()
        label()
        after()

Pipeline:
    name

    *list methods - add, del, copy, etc.*

    *validate*

    *_step*

    training methods / hooks
        before()
        process()
        label()
        after()

    processing methods / hooks
        before()
        process()
        label()
        after()

User Syntax Example

""" Loads an image, resizes it to 512x512, and converts to grayscale. """
import imagepypelines as ip

test_image = ip.lenna()   # :(

resizer = ip.blocks.Resizer(512,512)
color2gray = ip.blocks.Color2Gray()

grayscale_resize_pipeline = ip.Pipeline(blocks=[resizer,color2gray])

output = grayscale_resize_pipeline.run([test_image])

Now, lets use this pipeline as the first "block" in another pipeline.

edgemap_by_freq = ip.blocks.Highpass(cutoff=100)
viewer = ip.blocks.BlockViewer()

nested_pipeline =  ip.Pipeline(blocks=[grayscale_resize_pipeline,edgemap_by_freq, viewer])
nested_pipeline.run([test_image])

Discussion - Pros / Cons

Fundamentally, this will be the same thing as constructing a linear pipeline with all 5 example blocks. However, it allows more flexible pipeline creation (with unlimited nesting!) and the same power as before (with different training methods, etc). Additionally, this will simplify the syntax the user will have to memorize, because it will be very similar to any Block.

Major downsides of this approach include debugging and validating the block io types. I think both these downsides can be realistically mitigated if we assume blocks will always be used in a pipeline. In other words, we assume a user will never want (or get) those features on an individual block, and just implement them in the pipeline. That way, nothing really changes for the user, but we get nesting for free. This requires there to always be a pipeline container for serialization, debugging, and validation.

Implementation Side Changes