Open Gozala opened 6 months ago
@Gozala - as with other tickets, need your help on articulating priority and/or aligning this with implementation of some feature where we can better assign user value.
Also - what's your thought on cost here? (we can use the scale on the "Size" field on the project). I'm assuming we'd pull some off the shelf library here vs. implementing anything from scratch?
@Gozala - as with other tickets, need your help on articulating priority and/or aligning this with implementation of some feature where we can better assign user value.
Did my best to cover that in the updated description
Also - what's your thought on cost here? (we can use the scale on the "Size" field on the project).
Sorry I don't understand the question, costs of not implementing or costs to implementing ?
I'm assuming we'd pull some off the shelf library here vs. implementing anything from scratch?
I highly doubt we can do that. Mostly it is creating something that deals with platform limitations (of CF, AWS or serverless really) and implements something that usually language runtime provides with tools that these constrained runtimes provide.
Maybe someone has done something like this, but I have not looked into and still think this is one of the things that something tailored to our exact case is probably going to be the simplest solution
What
We need a proper scheduler to coordinate long running tasks and enable progress tracking.
Why
We keep working around the fact that we do not have a proper task scheduler in our system, which takes significant effort each time we do and introduces lot of context to be aware. Task coordination implies dealing with concurrency and race conditions and those are typically very difficult to get right. Bugs with race conditions are hardest to debug and reproduce. So getting it right once reduces risks of getting it wrong in one of the places and spending a lot of effort in fixing. There is also high cost in knowledge share, as current workarounds aren't obvious and there are lot of contextual details.
It would be a really good idea to actual take time and implement it so we can stop working around lack of having one.
Cost
It is hard to estimate without try to to make one, if I had to guess I'd say weeks.
Design Sketch
From prior thought on the subject I imagine it would roughly involve something along these lines
await
-ed tasksdependency/task
in some store with set semantics (on per awaited receipt)dependency/
and move tasks back into queue & delete those records