robinhood / faust

Python Stream Processing
Other
6.7k stars 538 forks source link

Is it possible to use table_route to join windowed results across workers #735

Closed fonty422 closed 2 years ago

fonty422 commented 2 years ago

Question I noted that with the word_count example, that the count table can be recalculated on a re-balance (using the @app.table_route option with the @app.page option). But is it possible to do this without the page option? Or can we call that internally on each worker to get the 'real' live count and then add to it?

Reason If we have possibly a few thousand messages per second and we need to have live counts for the last 10, 30, 60, and 300s for up to 50k keys, it seems like

  1. It makes sense that this would be split across multiple workers
  2. It seems necessary for fault tolerance - one worker dies, then the others pick up the task.

Point 2 is especially valid as this is a critical step in the processing pipeline for the end use so if it stops, then the whole thing stops.

fonty422 commented 2 years ago

Argh! Wrong repo again! Sorry.