Closed sopel39 closed 6 years ago
Commit that introduces ContinuousWork<T>
per concept described in this issue: https://github.com/prestodb/presto/pull/9854/commits/5d395edccb8c77b3a7f268063299e30690d46f97
How does ContinuousWork introduced in the other PR relate to "NonBlockingWork"? Isn't it just the same concept?
Yes this is the same concept albeit with different name. I think PR name is more adequate: ContinuousWork
@sopel39 Karol, is this what became com.facebook.presto.operator.WorkProcessor
? Is there anything else missing or can this issue be closed now?
Extends on the concept of https://github.com/prestodb/presto/issues/8697 and
Work
. One of the goals is to be able to use storage backed lazy pages outside of SFP in long term.https://github.com/prestodb/presto/issues/8697 idea was extending
Iterator
so that it could be used in a non-blocking way. The observation was that various Presto components have pattern of building pipeline from smaller processing components (e.g:PageProcessor
,MergeHashSort
, new Distributed merge sort, etc). However each component implements the concept a little bit different. It would be great if there is a abstraction over such pipeline that allows for:Therefore I propose to introduce:
NonBlockingWork
that would have interface similar to:Such
NonBlockingWork work
instance would be used in following steps:work
instance is not blocked (isBlocked/getBlockedFuture
methods)process
method until it returnstrue
.work
instance is finished (isFinished
method). If it's not finished then obtain result usinggetResult
method.Additionally, there would be utility class
WorkUtils
that would simplify creating ofWork/NonBlockingWork
pipelines. It would contain methods/interfaces like:Then there would be more complex transformation functions that would support blocking future:
With such model it should be really simple to build complex pipelines like:
ConnectorPageSource -> PageProcessor -> MergingPageOutput
MergeHashSort
class, Distributed merge sort)Great benefit is that in the long term we could use storage based lazy pages outside of SFP. Pipeline elements decide when they want to delegate call to upstream
work
. This way we can use lazy pages outside of SFP because reader won't be asked for next page until previous lazy page is consumed. Additionally state transitions are hidden from pipeline elements by utility classes. Pipeline elements are pull based only which also simplifies computation model.Initially we could start using such model within single operator (e.g:
ScanFilterProject
, distributed sort). Later on we might merge some operators together (e.g:SFP, TopN, MarkDistinct, DynamicFilter, JoinProbeSide, MarkDistinct
) under some "umbrella wholestage" operator. I would call it a kind of mild "wholestage compilation".FYI: @martint @dain @haozhun @findepi @kbajda