spring-attic / reactive-streams-commons

A joint research effort for building highly optimized Reactive-Streams compliant operators.
Apache License 2.0
357 stars 51 forks source link

Fusing subscribeOn with upstream #25

Closed LalitMaganti closed 8 years ago

LalitMaganti commented 8 years ago

@akarnokd has written in several places that the classic case of:

publisher.map(x -> someExpensiveComputation(x))
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(x -> someFn(x))

means that map and observeOn cannot be fused together. I understand this point well. However, I'm interested in a variant of this pipeline:

publisher.map(x -> someExpensiveComputation(x))
    .subscribeOn(Schedulers.io())
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(x -> someFn(x))

I might be misunderstanding the matrix, but it seems to suggest that subscribeOn cannot be fused with the map which as far as I can tell is possible? Moreover, the matrix suggests that subscribeOn and observeOn can be fused together which I am failing to understand.

Also a related question, is the async fusion in subscribeOn currently implemented? From the code it doesn't look like it (I'm looking at https://github.com/reactor/reactive-streams-commons/blob/master/src/main/java/rsc/publisher/PublisherSubscribeOn.java) but I'm not sure if it's been worked on locally.

Finally is there a place where @smaldini and @akarnokd discuss what is happening in this repo? I would be very interested in following along!

akarnokd commented 8 years ago

However, I'm interested in a variant of this pipeline:

subscribeOn can only macro-fuse a scalar or callable source which have the output fusion capability of ASYNC. Sometimes, operators have special paths such as this which leads to inconsistencies in the matrix. I probably should remove the output annotation from subscribeOn.

Generally, subscribeOn can't fuse because it can't negotiate the fusion mode between its downstream and upstream: the onSubscribe gets called in one thread but the actual source Publisher is called later on another thread. The fusion protocol mandates that when the onSubscribe returns, the pair has to be agreeing on the mode.

subscribeOn and observeOn

Generally, they can't.

Also a related question, is the async fusion in subscribeOn currently implemented?

No and won't be.

Finally is there a place where @smaldini and @akarnokd discuss what is happening in this repo?

Since there are only the two of us, we chat in hangouts quite frequently. However, the fusion-related discussions happened mostly half a year ago; today, we don't discuss the theory behind it anymore and only do maintenance-related work based on feedback from Reactor-Core users.

I would be very interested in following along!

Unless you are already a master of Reactive programming and Reactive Streams, you'd have found it confusing and under-explained. I believe the best distilled version of what we accomplished is in my two blog post about it: part 1, part 2.

LalitMaganti commented 8 years ago

Ah the part I was missing was the requirement to negotiate synchronously in onSubscribe. Makes sense that subscribeOn cannot as it is switching threads.

I've read the blog posts you linked (as well as much of the posts in your blog - thank you for creating such a useful resource BTW). I'm certainly no master of Reactive streams although I'm working on it :P (my current work in this area is playing around with zero-heap allocation - i.e. only stack allocated - reactive streams in Rust).

In any case, I'll be following closely with interest both your work on RxJava 2.0 as well as the maintenance work here :)

LalitMaganti commented 8 years ago

Sorry for reopening again - didn't think it was worth opening a new issue for this. But I was looking through the history of subscribeOn and I noticed that at one point (https://github.com/reactor/reactive-streams-commons/commit/90e6655ed5961f2a32a11155553ec881a4792402) there was a non-eager variant of subscribeOn which delayed onSubscribe until it received it from upstream.

What was the rationale for removing this variant? From what I could tell this would solve the problem you described above (requiring to negotiate on the same threead) and lend itself to fusion. I also notice that RxJava also has a TODO to move to calling onSubscribe to subscribe time. I guess that was determined based on your work here?

Thanks again for being so helpful to date :)

akarnokd commented 8 years ago

From what I could tell this would solve the problem you described above (requiring to negotiate on the same threead) and lend itself to fusion.

It prevented cancelling a subscription in time, which was an undesired behavior in general so we dropped it and stayed with the classical subscribeOn.

akarnokd commented 8 years ago

and lend itself to fusion

What would you like to fuse together?

RxJava also has a TODO to move to calling onSubscribe to subscribe time

I just posted a PR that fixes that.

LalitMaganti commented 8 years ago

I'm not interested in fusing an particular operator together TBH. The most interesting operators for fusion (to me at least) seem to be the thread switching ones (observeOn and subscribeOn).

Thanks for the info on cancellation. I did think that might be the reason. Good to get confirmation.

akarnokd commented 8 years ago

subscribeOn and observeOn are async-boundary operators and have to limit fusion to prevent user computation to escape. Sometimes you can replace subscribeOn with observeOn and get fusion (such as with direct sources as range, fromArray, fromIterable, just, fromCallable) but we don't automatically do that.

LalitMaganti commented 8 years ago

Yes precisely because they are required to limit fusion is why I find them interesting - most of the other operators have quite a straightforward reasons for supporting/not supporting fusion. Concurrent processes have always interested me to see how far they can be pushed. :)