minvws / nl-kat-coordination

OpenKAT scans networks, finds vulnerabilities and creates accessible reports. It integrates the most widely used network tools and scanning software into a modular framework, accesses external databases such as shodan, and combines the information from all these sources into clear reports. It also includes lots of cat hair.
https://openkat.nl
European Union Public License 1.2
128 stars 58 forks source link

[EPIC] Package all local Python boefjes in a container (could be a single container, targeting the specific modules by its arguments) #3593

Open underdarknl opened 2 months ago

underdarknl commented 2 months ago

Updated by @Donnype with the comment from @noamblitz.


About this feature

Detailed description

Currently, most boefjes are run in the boefje container as Python code. This has a few disadvantages:

Considerations:

We propose an iterative approach to package all Python boefjes into a single container.

Single container

Several decisions on the first step of the implementation:

Feature benefit/User story

As an expert user, I want to be able to reproduce raw files of the current Python boefjes. To do this, KAT should communicate the run command of the container.

Additional information

Design

Screenshots

Include screenshots of the proposed design changes here.

Figma link

Link to the Figma design for further visualization (if applicable)

noamblitz commented 1 month ago

About this feature

Detailed description

Currently, most boefjes are run in the boefje container as Python code. This has a few disadvantages:

Considerations:

We propose an iterative approach to package all Python boefjes into a single container.

  1. We create a single container with a Dockerfile in which we specify which boefjes should be copied, this container will be started and stopped on each boefje run
  2. We create a HTTP API around this container so it does not have to be stopped every time
  3. We create several runner files like kubernetes.py and docker.py so OPsers are able to choose how the boefjes will be run.

Single container

Several decisions on the first step of the implementation:

Feature benefit/User story

As an expert user, I want to be able to reproduce raw files of the current Python boefjes. To do this, KAT should communicate the run command of the container.

Additional information

Design

Screenshots

Include screenshots of the proposed design changes here.

Figma link

Link to the Figma design for further visualization (if applicable)

dekkers commented 6 days ago

Containerizing boefjes

The reason to run all boefjes in a container is to run the boefje in a sandbox. In the future is will be possible to also run boefjes created by others, not only boefjes created by KAT. Running those in a sandbox decreases the risk of doing that.

Starting a container for every boefje task results in a lot of overhead, so we want to support running multiple tasks in a single container.

For the boefjes containers we need to support two ways of deploing KAT:

This means we need to support for long running boefje containers. This boefje can either pull the tasks from the runner or the runner can push the tasks to the boefje container if the boefje container has a service.

Pull-based design

When there isn't any task available, the boefje can either wait on the boefjes runner for a new task to be available using long-polling or just do a new request after some timeout.

Push-based design

The pull-based design is how task queues are usually implemented, a process that executes tasks pulls the tasks from the queue.

Pushing tasks gives more complications if you want to scale to multiple boefje containers that execute tasks. How will the boefje runner know to which container to push the task? Some boefje tasks might take a very long time to execute, while other tasks might be short. If you want to use things like autoscaling and use a loadbalancer for the boefje HTTP service the question is how the load balancing should work with those very long running tasks. HTTP load balancers usually balance a high number of short duration HTTP requests, not long running tasks.