Investigate scheduler replication strategy for kubernetes

jpbruinsslot commented 6 months ago

Investigate whether the scheduler can be optimized to run inside a kubernetes cluster leveraging its functionality

jpbruinsslot commented 5 months ago

Proposal

The most straightforward solution without creating too much friction and utilizing what has already been built within the scheduler. Is to supplement the scheduler such that it can run in a 'leader mode', and a 'stand-alone mode'. The scheduler running in the leader mode (Director) will be tasked with monitoring organisations from a rabbitmq queue (currently monitor_organization in app.py queries the katalogus for new organizations we can leverage a rabbitmq queue here). By using the Official Python client library for kubernetes we should be able to create pods with an individual scheduler, when organizations are added to the system. When the scheduler is run standalone mode it will accept arguments to run a single scheduler for a single organization (in a container).

flowchart LR
    R[RabbitMQ]-->|monitor_organisations|D
    D[Director]
    D --> S1[Container: BoefjeScheduler ORG-1]
    D --> S2[Container: NormalizerScheduler ORG-1]
    D --> S3[Container: BoefjeScheduler ORG-n]
    D --> S4[Container: NormalizerScheduler ORG-n]

Currently, the scheduler runs each individual scheduler (BoefjeScheduler, NormalizerScheduler, and in the future ReportingScheduler) for an organization in a thread. This can then be extended to support threads, processes, and pods with the same code. When not running in a kubernetes / container set-up the schedulers can then be run in either threads or processes (if we wish to support running schedulers in processes).

flowchart TB
    RabbitMQ-->a1
    subgraph Scheduler
    a1[monitor_organisations]
    a1 --> S1[Thread/Process: BoefjeScheduler ORG-1]
    a1 --> S2[Thread/Process: NormalizerScheduler ORG-1]
    a1 --> S3[Thread/Process: BoefjeScheduler ORG-n]
    a1 --> S4[Thread/Process: NormalizerScheduler ORG-n]
    end

Changes

Make changes to app.py such that the scheduler can run as leader-mode and stand-alone mode. For the stand-alone mode this can be achieved by supplementing the scheduler to allow for command-line arguments
Next to running the scheduler in a thread, extend this to run the schedulers in a process
Implement the kubernetes python package to programmatically start and destroy pods.

Considerations

Would the Director container implementation of the scheduler have enough permissions to create pods? Does it need extra configuration in order to make this work in kubernetes installations?

underdarknl commented 5 months ago

As you note, there's a few questions that we want to answer next to the 'scalability' I think:

Can we access the k8s control plane from within OpenKAT, or should we rely on another mechanism to create these new pods.
Where is the state for these pods kept. In the general postgres store? And does that mean we can easily restart or recreate these scheduling pods?
Do we want to expose the scheduler's job-pop api's for each queue separately, and how do we expose / collect those endpoints so that our job-runners can find them?
Is there a way to separate the logic of optimizing the queue, ingesting/creating new jobs from the logic of pop-ing jobs from the queue, and which of these can be ran parallelized withing a single queue. Keeping in mind we don't need perfect, we just need reasonable schedules.

I link the concept of a thin wrapper which either forwards the requests to a thread / process or separate service / pod. This would allow us to develop meaningful and fitting systems for small and large setups.

jpbruinsslot commented 4 months ago

Next step is to try and see if we still can leverage the kubernetes replication strategy.

Check with docker-compose and replicate the scheduler container and see what works and what doesn't
Report on findings and potential next steps

minvws / nl-kat-coordination