Open syodage opened 3 years ago
@syodage Did you figure out a workaround for this? I am facing the exact same problem.
Hey! Bumping this issue, since I've stumbled upon the same thing
Thanks for your thorough report @syodage
@syodage any update on this issue
When we are using side inputs with streaming pipelines, most of the use cases require this side inputs to be refreshed(re-calculate) over time. Scio doesn't have a nicer way to do this. Apache beam has this refreshable side input patterns define to work with both global and non-global windowing to address this easily.
However exact example code snippets work with neither DirectRunner nor DataflowRunner. The DirectRunner case side input doesn't output any data to the main pipeline code and with DataflowRunner it throws this error[4].
This issue has been raised in the apache beam user mailing list[1][2][3] a few years ago and concluded with suggesting to address the use case with help of guava LoadingCache, which periodically updates the local cache. Which of course not the beam way of doing it.
Related Issues: https://github.com/spotify/scio/issues/3521 , https://github.com/spotify/scio/issues/3201 , https://github.com/spotify/scio/issues/1190 , https://github.com/spotify/scio/issues/2525
[1] https://lists.apache.org/thread.html/%3CB1660EAB-AEC8-4635-8386-8353685DB19A@gameduell.de [2] https://lists.apache.org/thread.html/a5d804685a5810594a7860709fbcd6d3a22ead6e871fc3073a65ef1e@%3Cuser.beam.apache.org%3E [3] https://lists.apache.org/thread.html/681de1ae372951988a00b9affa7480f3117d3cae6dae9ee2c69baba4@%3Cuser.beam.apache.org%3E
[4]