Open vibbix opened 11 months ago
Hi, @vibbix!
Thanks for sharing your interesting use case! Am I understanding correctly that you need a zip version which will be zipping until the longest source is done, while the other sources which has ended should return fallback or null value?
Cheers, Oleh
Wondering whether the existing API ideas can be used to achieve this result with the combinator variants, e.g.
zip(Function<? super Object[],? extends O> combinator, Publisher<? extends I>... sources)
when made configurable to wait for the last active source instead of terminating upon the first finishing one. The combinator would need to probably return a structure that contains both the output tuple and a decision object that says for particular source whether the value was consumed or should be reused for another zipping round.
The notion of zipping on key is quite limiting potentially.
@OlegDokuka Having a Flux.zipDelayComplete
Being able to zip data sources with default fallback values for mismatched lengths would be very useful too. I tend to hack this together by using along the line of Flux.concat(source1, Flux.create(...))
that produces Map.Entry<Integer,Optional<T>>(Integer.MAX_VALUE,Optional.empty())
until the parent subscription cancels. It involves a lot of unboxing, but I built some static helpers to make it easier.
@chemicL
The combinator would need to probably return a structure that contains both the output tuple and a decision object that says for particular source whether the value was consumed or should be reused for another zipping round.
This would be great, and if there was something like this where I can request that the output tuple "replenish" the producer slot that I consumed, I would build these sort of functions on top of that.
in Considered Alternatives
I built out this functionality today using Flux.mergeOrderComparing
+ flux.WindowUntilChanged
with generic Map.Entry's & Marker interfaces, but we lose some type-safety and the concept of which "slot"/source publisher the result comes from.
In my case it's more about keeping a row-level/horizontal data structure in-tact. Making this a built-in Reactor operator would ensure the entire workflow is type-safe, that each source publisher is having it's backpressure dealt with correctly, and that buffer bloat is minimized.
The notion of zipping on key is quite limiting potentially.
It could be a great shortcut for this common use case. I tend to have different types in each zip
'd source publisher, and even cases where I read different keys from the same object(although this is certainly a more unusual case). I have been workshopping different method signatures for a couple months, and this is the closest I got to a clean signature for a external implementation. Otherwise, each source flux would need a corresponding Function<? extends T1, ? extends K>
key combinator.
@vibbix would you be so kind to provide some more test cases with some corner cases to help us better understand and consider a possible design? For one, I'm wondering if the keys can appear more than once.
If at all you'd be willing to provide the code for the implementation that works so far that would also be beneficial.
@chemicL I created a example of what I tend to use now here: vibbix/rx-experiments. The attached README.md
has a description of my thought process in the design as well. I have been working on some examples on what a coordinator structure could look like to handle the incoming values as well.
For one, I'm wondering if the keys can appear more than once.
In my design, I assume that any publisher that has multiple of the same keys incoming have to be grouped prior.
@vibbix thank you. We appreciate your input. We are in the planning process currently and will get back when we have some priorities. Just to get a sense of work involved - are you interested in contributing something once we settle on design or would you expect the team or community to provide an implementation?
Combine a
Flux.zip
-akin operator with a key-selecting variant ofFlux.mergeComparing
for publishers that should be merged based on keys, for both finite and unbounded sources, of any combination in length.Motivation
I use Reactor everyday in my data pipeline work, to pretty great success. The lazy operators are amazing at handling complex merge operations across many distinct sources. One of the things I run into however is the case when I am trying to fan-in multiple sources of data that have different lengths. and mismatched (but ordered) keys.
Example use-case
An example of this would be merging in 4 different JSON arrays, where a "match-key" would be missing from some of the sets, or that some of the sets have totally different lengths, and would short circuit early.
I have used
Flux.groupBy
in the past, but that doesn't work in a unbounded Flux case I tend to create a custom interleave for these situations, but a generic solution would be incredibly helpful.Desired solution
An example signature for this kind of operator that I have experimented with:
Desired output
Test Case
Considered alternatives
Flux.groupBy
doesn't work in unbounded / infinite publisher situations.Mono.zip
Flux.mergeComparingDelayError(...)
Flux.windowUntilChanged
to group the entitiesflatMap
withreduceWith
accumulator to build the tuple outOptional.empty()
for unmatched fields