mila-iqia / fuel

A data pipeline framework for machine learning
MIT License
867 stars 268 forks source link

Shell-style piping for transformers #358

Open dmitriy-serdyuk opened 8 years ago

dmitriy-serdyuk commented 8 years ago

I propose to add a syntactic sugar to easier manipulate streams, we could use some infix operator to combine transformers (like | or >). | would look like linux piping:

stream = Mapping(Flatten(DataStream.default_stream(dataset)), my_func)

becomes

stream = dataset | DataStream.default_stream() | Flatten() | Mapping(my_func)

Less parenthesis, transformers are applied in the direct order.

Implementation looks very straightforward.

What do you guys think?

nouiz commented 8 years ago

Does python support that syntax? I never heard of that.

On Wed, Jul 27, 2016 at 3:44 PM, dmitriy-serdyuk notifications@github.com wrote:

I propose to add a syntactic sugar to easier manipulate streams, we could use some infix operator to combine transformers (like | or >). | would look like linux piping:

stream = Mapping(Flatten(DataStream.default_stream(dataset)), my_func)

becomes

stream = dataset | DataStream.default_stream() | Flatten() | Mapping(my_func)

Less parenthesis, transformers are applied in the direct order.

Implementation looks very straightforward.

What do you guys think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/fuel/issues/358, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-5BnbP2B47e3c5WW3sknfSHBmz7Hks5qZ7UXgaJpZM4JWjd0 .

dwf commented 8 years ago

I like it a lot. Current syntax gets quite hairy.

Is it going to be easy to do this without breaking literally everyone's code?

I imagine we could make the first argument optional via a decorator like @lazy and make self.stream or whatever a property that raises an informative message. Not sure how to get Mapping(func) to work without passing as a kwarg though.

On Wed, Jul 27, 2016, 3:44 PM dmitriy-serdyuk notifications@github.com wrote:

I propose to add a syntactic sugar to easier manipulate streams, we could use some infix operator to combine transformers (like | or >). | would look like linux piping:

stream = Mapping(Flatten(DataStream.default_stream(dataset)), my_func)

becomes

stream = dataset | DataStream.default_stream() | Flatten() | Mapping(my_func)

Less parenthesis, transformers are applied in the direct order.

Implementation looks very straightforward.

What do you guys think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/fuel/issues/358, or mute the thread https://github.com/notifications/unsubscribe-auth/AADrLqYtcDY_mpH1sdfvNddbl7RC4b3eks5qZ7UXgaJpZM4JWjd0 .

dmitriy-serdyuk commented 8 years ago

@nouiz , it is a bit-wise or operator which can be overridden with __or__.

I think about implementing a proxy object in the case if transformer was created without the datastream argument. The object should store a link to the transformer and create a "real" one as soon as the pipe is constructed.

dmitriy-serdyuk commented 8 years ago

Why not make it always optional? It shouldn't break others' code.

When an outside user decides to use piping syntax, she just rewrites her custom transformers.

dmitriy-serdyuk commented 8 years ago

What if create a class factory method for lazy constructor? Like Mapping.lazy(func) or Mapping.pipe(func).