Implement task scheduler

Gozala commented 6 months ago

What

We need a proper scheduler to coordinate long running tasks and enable progress tracking.

Why

We keep working around the fact that we do not have a proper task scheduler in our system, which takes significant effort each time we do and introduces lot of context to be aware. Task coordination implies dealing with concurrency and race conditions and those are typically very difficult to get right. Bugs with race conditions are hardest to debug and reproduce. So getting it right once reduces risks of getting it wrong in one of the places and spending a lot of effort in fixing. There is also high cost in knowledge share, as current workarounds aren't obvious and there are lot of contextual details.

It would be a really good idea to actual take time and implement it so we can stop working around lack of having one.

Cost

It is hard to estimate without try to to make one, if I had to guess I'd say weeks.

Design Sketch

From prior thought on the subject I imagine it would roughly involve something along these lines

Received invocation is stored in the bucket and pointer added to the Active Queue (probably SQS).
Queue consumer will process invocations in FIFO order.
- It attempts to resolve receipts for all await-ed tasks
  - If some awaited receipt is not found
  - store a record for dependency/task in some store with set semantics (on per awaited receipt)
  - If all awaited receipts are available
  - Derive the unique task identifier and try to resolve receipt
    - If receipt found do nothing (task was processed concurrently)
    - If receipt not found
    - Execute task handler
    - Write receipt into the bucket (unless receipt is present, in which case there was concurrency)
    - Add all effects to the Active queue
    - Lookup records for dependency/ and move tasks back into queue & delete those records

reidlw commented 6 months ago

@Gozala - as with other tickets, need your help on articulating priority and/or aligning this with implementation of some feature where we can better assign user value.

Also - what's your thought on cost here? (we can use the scale on the "Size" field on the project). I'm assuming we'd pull some off the shelf library here vs. implementing anything from scratch?

Gozala commented 6 months ago

@Gozala - as with other tickets, need your help on articulating priority and/or aligning this with implementation of some feature where we can better assign user value.

Did my best to cover that in the updated description

Also - what's your thought on cost here? (we can use the scale on the "Size" field on the project).

Sorry I don't understand the question, costs of not implementing or costs to implementing ?

I'm assuming we'd pull some off the shelf library here vs. implementing anything from scratch?

I highly doubt we can do that. Mostly it is creating something that deals with platform limitations (of CF, AWS or serverless really) and implements something that usually language runtime provides with tools that these constrained runtimes provide.

Maybe someone has done something like this, but I have not looked into and still think this is one of the things that something tailored to our exact case is probably going to be the simplest solution

storacha / project-tracking