terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

Stateful jobs should be triggered automatically by the use of a stateful asset #3782

Closed godber closed 1 month ago

godber commented 1 month ago

We have this notion of "Stateful Workers" as described here:

https://terascope.github.io/teraslice/docs/configuration/clustering-k8s/#stateful-workers

In Teraslice jobs you can set the "stateful": true, property and the workers will be slightly different (see docs). I've realized that this is actually a property of the asset in use, and should probably be set automatically simply by using a "stateful" asset.

In general this is the stateful processor being used:

https://github.com/terascope/elasticsearch-assets/tree/master/asset/src/elasticsearch_state_storage

This probably needs some internal API added to Teraslice that the processor will use to indicate that it is stateful. Perhaps a property on the processor that Teraslice uses.

godber commented 1 month ago

After further discussion, I don't think this is what we need. What is needed is to annotate jobs that have upstream dependencies with the job_id of that upstream job so that it can be paused. It's possible there are cases where this label would be useful, but it would be a low impact change so I am just going to close this.