Open underdarknl opened 2 months ago
Currently, most boefjes are run in the boefje container as Python code. This has a few disadvantages:
Considerations:
We propose an iterative approach to package all Python boefjes into a single container.
kubernetes.py
and docker.py
so OPsers are able to choose how the boefjes will be run.Several decisions on the first step of the implementation:
boefje.json
we will specify the path to the main.py
so the code around boefje resolving does not have to be added to the new container. This can be done later if needed.boefje_id
or the path to the boefje.json
.As an expert user, I want to be able to reproduce raw files of the current Python boefjes. To do this, KAT should communicate the run command of the container.
Include screenshots of the proposed design changes here.
Link to the Figma design for further visualization (if applicable)
The reason to run all boefjes in a container is to run the boefje in a sandbox. In the future is will be possible to also run boefjes created by others, not only boefjes created by KAT. Running those in a sandbox decreases the risk of doing that.
Starting a container for every boefje task results in a lot of overhead, so we want to support running multiple tasks in a single container.
For the boefjes containers we need to support two ways of deploing KAT:
KAT has access to the container system control plane to start/stop containers. In this case KAT can automatically start new containers when necessary, but there needs to be runner that can talk to the control plane and start them.
KAT has no acess to the control plane and the system administrator configures all the necessary containers themself beforehand.
This means we need to support for long running boefje containers. This boefje can either pull the tasks from the runner or the runner can push the tasks to the boefje container if the boefje container has a service.
When there isn't any task available, the boefje can either wait on the boefjes runner for a new task to be available using long-polling or just do a new request after some timeout.
The pull-based design is how task queues are usually implemented, a process that executes tasks pulls the tasks from the queue.
Pushing tasks gives more complications if you want to scale to multiple boefje containers that execute tasks. How will the boefje runner know to which container to push the task? Some boefje tasks might take a very long time to execute, while other tasks might be short. If you want to use things like autoscaling and use a loadbalancer for the boefje HTTP service the question is how the load balancing should work with those very long running tasks. HTTP load balancers usually balance a high number of short duration HTTP requests, not long running tasks.
Updated by @Donnype with the comment from @noamblitz.
About this feature
Detailed description
Currently, most boefjes are run in the boefje container as Python code. This has a few disadvantages:
Considerations:
We propose an iterative approach to package all Python boefjes into a single container.
kubernetes.py
anddocker.py
so OPsers are able to choose how the boefjes will be run.~ UPDATE from @Donnype: we decided to postpone this as the kubernetes runner would only be needed for installs where we can talk to control planes and need to dynamically start newly created (containerized) boefjes.Single container
Several decisions on the first step of the implementation:
boefje.json
we will specify the path to themain.py
so the code around boefje resolving does not have to be added to the new container. This can be done later if needed.boefje_id
or the path to theboefje.json
.Feature benefit/User story
As an expert user, I want to be able to reproduce raw files of the current Python boefjes. To do this, KAT should communicate the run command of the container.
Additional information
Design
Screenshots
Include screenshots of the proposed design changes here.
Figma link
Link to the Figma design for further visualization (if applicable)