minvws / nl-kat-coordination

Repo nl-kat-coordination for minvws
European Union Public License 1.2
123 stars 55 forks source link

Investigate scheduler replication strategy for kubernetes #2665

Open jpbruinsslot opened 6 months ago

jpbruinsslot commented 6 months ago

Investigate whether the scheduler can be optimized to run inside a kubernetes cluster leveraging its functionality

jpbruinsslot commented 5 months ago

Proposal

The most straightforward solution without creating too much friction and utilizing what has already been built within the scheduler. Is to supplement the scheduler such that it can run in a 'leader mode', and a 'stand-alone mode'. The scheduler running in the leader mode (Director) will be tasked with monitoring organisations from a rabbitmq queue (currently monitor_organization in app.py queries the katalogus for new organizations we can leverage a rabbitmq queue here). By using the Official Python client library for kubernetes we should be able to create pods with an individual scheduler, when organizations are added to the system. When the scheduler is run standalone mode it will accept arguments to run a single scheduler for a single organization (in a container).

flowchart LR
    R[RabbitMQ]-->|monitor_organisations|D
    D[Director]
    D --> S1[Container: BoefjeScheduler ORG-1]
    D --> S2[Container: NormalizerScheduler ORG-1]
    D --> S3[Container: BoefjeScheduler ORG-n]
    D --> S4[Container: NormalizerScheduler ORG-n]

Currently, the scheduler runs each individual scheduler (BoefjeScheduler, NormalizerScheduler, and in the future ReportingScheduler) for an organization in a thread. This can then be extended to support threads, processes, and pods with the same code. When not running in a kubernetes / container set-up the schedulers can then be run in either threads or processes (if we wish to support running schedulers in processes).

flowchart TB
    RabbitMQ-->a1
    subgraph Scheduler
    a1[monitor_organisations]
    a1 --> S1[Thread/Process: BoefjeScheduler ORG-1]
    a1 --> S2[Thread/Process: NormalizerScheduler ORG-1]
    a1 --> S3[Thread/Process: BoefjeScheduler ORG-n]
    a1 --> S4[Thread/Process: NormalizerScheduler ORG-n]
    end

Changes

Considerations

underdarknl commented 5 months ago

As you note, there's a few questions that we want to answer next to the 'scalability' I think:

I link the concept of a thin wrapper which either forwards the requests to a thread / process or separate service / pod. This would allow us to develop meaningful and fitting systems for small and large setups.

jpbruinsslot commented 4 months ago

Next step is to try and see if we still can leverage the kubernetes replication strategy.