Include support for different join types in HashJoin (right join by doing swap and left join).
Add mode options to MapperDfMergerNode where
BothOnline: both channels can be online (take one from each)
LeftOnline: Left channel is online. So, consume left every time merge called. Right dataframe is read only once and re-used.
RightOnline: Similar to above, but for Right.
Add post_process_msg to MessageProcessor to support processors that block. For example, if we want to accumulate different partitions' dataframes into one. Then, each process_msg can return None and depending on the merge strategy (currently VStack or KeepLast), post_process_msg would return the merged dataframe. By default, post_process_msg returns None and hence doesn't affect other nodes implementing MessageProcessor.
Changes in this PR:
HashJoin
(right join by doing swap and left join).mode
options toMapperDfMergerNode
wherepost_process_msg
toMessageProcessor
to support processors that block. For example, if we want to accumulate different partitions' dataframes into one. Then, eachprocess_msg
can return None and depending on the merge strategy (currentlyVStack
orKeepLast
),post_process_msg
would return the merged dataframe. By default,post_process_msg
returns None and hence doesn't affect other nodes implementing MessageProcessor.