Open jpbruinsslot opened 6 months ago
The most straightforward solution without creating too much friction and utilizing what has already been built within the scheduler. Is to supplement the scheduler such that it can run in a 'leader mode', and a 'stand-alone mode'. The scheduler running in the leader mode (Director
) will be tasked with monitoring organisations from a rabbitmq queue (currently monitor_organization
in app.py
queries the katalogus for new organizations we can leverage a rabbitmq queue here). By using the Official Python client library for kubernetes we should be able to create pods with an individual scheduler, when organizations are added to the system. When the scheduler is run standalone mode it will accept arguments to run a single scheduler for a single organization (in a container).
flowchart LR
R[RabbitMQ]-->|monitor_organisations|D
D[Director]
D --> S1[Container: BoefjeScheduler ORG-1]
D --> S2[Container: NormalizerScheduler ORG-1]
D --> S3[Container: BoefjeScheduler ORG-n]
D --> S4[Container: NormalizerScheduler ORG-n]
Currently, the scheduler runs each individual scheduler (BoefjeScheduler
, NormalizerScheduler
, and in the future ReportingScheduler
) for an organization in a thread. This can then be extended to support threads, processes, and pods with the same code. When not running in a kubernetes / container set-up the schedulers can then be run in either threads or processes (if we wish to support running schedulers in processes).
flowchart TB
RabbitMQ-->a1
subgraph Scheduler
a1[monitor_organisations]
a1 --> S1[Thread/Process: BoefjeScheduler ORG-1]
a1 --> S2[Thread/Process: NormalizerScheduler ORG-1]
a1 --> S3[Thread/Process: BoefjeScheduler ORG-n]
a1 --> S4[Thread/Process: NormalizerScheduler ORG-n]
end
app.py
such that the scheduler can run as leader-mode and stand-alone mode. For the stand-alone mode this can be achieved by supplementing the scheduler to allow for command-line argumentsDirector
container implementation of the scheduler have enough permissions to create pods? Does it need extra configuration in order to make this work in kubernetes installations?As you note, there's a few questions that we want to answer next to the 'scalability' I think:
I link the concept of a thin wrapper which either forwards the requests to a thread / process or separate service / pod. This would allow us to develop meaningful and fitting systems for small and large setups.
Next step is to try and see if we still can leverage the kubernetes replication strategy.
Investigate whether the scheduler can be optimized to run inside a kubernetes cluster leveraging its functionality