Closed MingweiSamuel closed 9 months ago
Suggestion: next_tick(EAGER)
vs next_tick(LAZY)
forces the user to choose whether to eagerly schedule or wait for an external event.
Cleaner: one operator is about scheduling (tick()
), another is about deferring dataflow (defer()
).
On scheduling, we have tick()
(internally data-driven) and two sources: spin()
(runtime-completion-driven) and source_interval()
(wall-clock-time-driven). We should think holistically about this category of ops. Regular sources (source_stream()
, etc) are externally data-driven).
tick()
is kind of a sink/source combo ("boomerang" data to yourself across a tick boundary). E.g.:
source_stream() -> tick() -> map()
could be
source_stream(bar) -> dest_local(foo)
source_stream(foo) -> map()
which arguably makes it easier to see the ticking and its results in the middle of a big chain.
Right now a cycle thru ticks (
next_tick()
) will cause the scheduler to spin as fast as possible as data cycles. We should try only starting the next tick if an actually external event happens. And see how much that helps and hurts