pescadores / pescador

Stochastic multi-stream sampling for iterative learning
https://pescador.readthedocs.io
ISC License
76 stars 12 forks source link

Functional Transformations on Streamers #125

Open jongwook opened 6 years ago

jongwook commented 6 years ago

I'd love to have a monadic/functional methods on Streamer and its subclasses. Those include:

This programming style and nomenclature are from the functional programming idioms, but they are becoming increasingly popular in both server-side asynchronous/event-based programming and data processing. Examples include ReactiveX in many languages, CompletableFuture in Java 8+, every collection type in Scala, RDD in Apache Spark, and most recently tf.data from TensorFlow.

Personally, I think this is the right API to deal with the flow of the large datasets we work on - It makes it simple to compose short and easy-to-debug operations to build a wide range of complex behaviors, and the data is treated immutable making the whole process less error-prone.

I can imagine the following kinds of use cases:

The proof of concept implementation is in Python 3 (will break miserably in Python 2 for now) and has an implementation of Streamer class with desired methods, along with example usages. The part 4 shows how this API can perform multiplexing on its own, and part 5 shows how it can be incorporated to the current Class API without breaking it. It relies heavily on generator comprehensions and lambdas to enable lazy evaluation (which makes the examples like the sieve of Eratosthenes possible), but I hope this can be incorporated in a less hacky-looking way.

The naming and arguments for each method and whether or not to include them are up to discussion. Please let me know how you'd think!

bmcfee commented 5 years ago

Sorry to take so long in responding to this! (What's 18 months between friends, anyway...)

I like the general idea here, but I'm a little wary of scope. Much of what you describe applies to any kind of iterable sequence, and could already be achieved using functionality in itertools. I'm not sure we need to do anything here.

The one big exception would be group, which really needs to understand the shape of iterates, and which we already implement as buffer. (shuffle is a bit of a weird case, but I think you can reasonably implement that as a generic iterator.)