Design new Boefjes runner

praseodym commented 1 year ago

Design new Boefjes runner. As discussed, we will implement this before implementing the new Normalisers runner (#1136).

Relates to https://github.com/minvws/nl-kat-coordination/issues/81, https://github.com/minvws/nl-kat-coordination/discussions/82

dekkers commented 1 year ago

https://apptainer.org might be an option. They focus on reproducibility for scientific computing which matches our requirement for having a proper audit trail for findings. They also support signing images which is also nice to have.

praseodym commented 1 year ago

Apptainer looks interesting but their major differentiating features, such as image signing and verification, only work with their SIF image format. We explicitly also want to support orchestrators such as Kubernetes (#1102) and Nomad, which means we want to have OCI images as the primary image format. Apptainer has OCI image support but doesn't provide a lot of benefits over other runtimes in that case.

r3boot commented 1 year ago

Some input on this change.

I think that having a worker pattern applied to the boefjes (and maybe other parts of KAT) would help with scaling. This idea leverages rabbitmq that is already in use within KAT.

So each boefje would conceptually run like this:

function run_boefje(job_metadata) {
  return do_very_important_stuff
}

for (;;) {
  job = read_job_from_queue()
  response = run_boefje(job)
  submit_to_queue(response)
}

The scheduler should submit jobs onto the queue, and the normalizer should pick up from the queue. Each job should have a unique identifier allowing all parts of the system to correlate jobs.

This would allow us to easily run as much boefjes in parallel as the hardware can support, thereby giving us both horizontal and vertical scaling (assuming single-threaded boefjes).

praseodym commented 1 year ago

KAT already uses a worker pattern for boefjes and normalisers. The scheduler has a queues of boefjes and normaliser tasks that need to be executed, and the boefje runner and normaliser runner will pop items off those queues. The number of workers can be increased using the BOEFJES_POOL_SIZE setting (defaults to 2 workers per queue).

This issue, and the design in #1620, affects how we run contanerised boefjes. Instead of spawning Docker containers from the boefje code, the runner will start a boefje container using Docker/Kubernetes/Nomad with the expectation that all boefje code runs in that container.

r3boot commented 1 year ago

That is not what I mean with the worker pattern. What I mean is, to start a fixed number of boefjes (depending on the amount of resources available), and running the while loop inside of the boefje, thereby making them long-lived processes that act as workers on a queue.

By using this pattern you get a bunch of benefits:

you dont need to write code to manage both nomad and k8s
scaling can be done by simply adding more workers to the queue
you dont have the overhead of spawning a container per item you want to scan
you move the responsibility of scaling to the parties that operate it

Or, to put it differently. I think that container/job orchestration is a job best left for products that are built for that (like k8s and nomad). KAT, being a scanner, should not have the ambition to get into orchestration imho, because this will increase the complexity of the project. And then there is the security aspect of a cluster job submitting another cluster job (or another container) outside of the control of the operators of the cluster (I know that we have an opinion about this one).

By keeping the architecture simple and clean, it will make it possible for operators to implement the architecture that matches their performance and security requirements by themselves, leveraging the building blocks that KAT would bring, while at the same time allowing for a more rapid release cycle (because complexity is both removed and broken down into more sizable chunks).

praseodym commented 1 year ago

That is not what I mean with the worker pattern. What I mean is, to start a fixed number of boefjes (depending on the amount of resources available), and running the while loop inside of the boefje, thereby making them long-lived processes that act as workers on a queue.

This is how the current boefje runner works, but not using containers. All boefjes are written in Python, and the boefje runner uses Python multiprocessing to start multiple worker processes. Each worker can then run boefjes from the queue, one at a time. This can scale up with more worker processes, up to the limits of a single machine.

However, many boefjes require the use of external tools that are more than just a Python library. Examples of this are Nmap, WPScan, MASSCAN and Playwright. Because many of these tools also have their own dependencies, we use Docker containers to distribute them. Currently this means that the boefje Python code will include code to run containers directly using the Docker Python library, which requires a Docker socket to work.

This is not ideal, because organisations that run a container orchestrator want to run all containers through their orchestrator, so that system resources are properly allocated and security measures can be taken. Currently, we run containers directly on the system, bypassing the orchestrator's scheduling. Also, giving KAT access to the Docker socket is equivalent to handing out root permissions on the system.

Another issue is that boefjes are currently all packaged in the KAT codebase and distributed as part of our releases. Also, all Python dependencies need to be packaged with KAT, and with a growing number of boefjes we have also had the occasional conflicting dependency. Writing a boefje in another programming language then Python is currently not possible at all. All this means that the current boefje ecosystem has little room for growth.

So the plan we came up with in #1620 means that we will add the possibility for boefjes to completely run in a container. The boefje runner will be responsible for starting a container for every task, either using Docker (like currently) or through an orchestrator like Kubernetes or Nomad. The container itself will read input, execute tasks as needed, post its output, and exit. Because every boefje is self-contained in a container image, each boefje can use whatever programming language with whatever dependencies or tools it needs. Container images can be distributed outside of KAT, although we will of course have a set of first-party boefjes.

There is of course overhead involved in this, which is why we're keeping the Python runner for simple tasks for now, and will only migrate existing boefjes to the new runner if they are already using containers. In a second iteration of this design, we will consider how to run a larger number of tasks efficiently without the overhead of starting a new container for each task. This should allow us to also migrate the remaining boefjes for short tasks to the new runner.

r3boot commented 1 year ago

That is not what I mean with the worker pattern. What I mean is, to start a fixed number of boefjes (depending on the amount of resources available), and running the while loop inside of the boefje, thereby making them long-lived processes that act as workers on a queue.

This is how the current boefje runner works, but not using containers. All boefjes are written in Python, and the boefje runner uses Python multiprocessing to start multiple worker processes. Each worker can then run boefjes from the queue, one at a time. This can scale up with more worker processes, up to the limits of a single machine.

Yes, this approach makes a lot of sense for a bare-metal setup, where you are free to use all resources available in a system. The problem is that within an orchestration platform, you should limit the resources your jobs can use, to prevent a single container from dominating a node, and hence, a boefje cant assume it has access to all available resources. This is why, for a containerized platform, its more useful to have a single worker per container approach. If you combine these workers with metrics, you can easily build a system where the orchestration platform will auto scale up/down for the amount of resources available. By using this architecture, you give the operators full control over how scaling should be done, since they know their workloads the best.

However, many boefjes require the use of external tools that are more than just a Python library. Examples of this are Nmap, WPScan, MASSCAN and Playwright. Because many of these tools also have their own dependencies, we use Docker containers to distribute them. Currently this means that the boefje Python code will include code to run containers directly using the Docker Python library, which requires a Docker socket to work.

This is not ideal, because organisations that run a container orchestrator want to run all containers through their orchestrator, so that system resources are properly allocated and security measures can be taken. Currently, we run containers directly on the system, bypassing the orchestrator's scheduling. Also, giving KAT access to the Docker socket is equivalent to handing out root permissions on the system.

This sounds like a packaging issue to me, mostly. Instead of making a container per tool and a boefje spawning those containers, you could also bundle the tool together with the boefje that uses the tool into a single container. This will give you an atomic container, which will always have the relevant dependencies for that boefje installed and allows the operator to determine how scaling should be done. Running docker-in-docker is, as you mentioned, a huge security risk, and should be prevented as far as possible.

Another issue is that boefjes are currently all packaged in the KAT codebase and distributed as part of our releases. Also, all Python dependencies need to be packaged with KAT, and with a growing number of boefjes we have also had the occasional conflicting dependency. Writing a boefje in another programming language then Python is currently not possible at all. All this means that the current boefje ecosystem has little room for growth.

This is not really a problem. You could have a bunch of Dockerfiles for each boefje, and build + push them all from one codebase. All dependencies that come with the tools the boefje runs are installed + managed via the container, so they dont pollute the repo. Another option is to split out each boefje to its own repository. I also think that, if there would be 1) a simple send + receive api around a boefje, combined with 2) a more easy deployment of the containers (basically, removing the whole docker-in-docker thing) will lead to an ecosystem which will be more flexible. Think building blocks that allow the operator to build a streaming scanning platform.

So the plan we came up with in #1620 means that we will add the possibility for boefjes to completely run in a container. The boefje runner will be responsible for starting a container for every task, either using Docker (like currently) or through an orchestrator like Kubernetes or Nomad. The container itself will read input, execute tasks as needed, post its output, and exit. Because every boefje is self-contained in a container image, each boefje can use whatever programming language with whatever dependencies or tools it needs. Container images can be distributed outside of KAT, although we will of course have a set of first-party boefjes.

There is of course overhead involved in this, which is why we're keeping the Python runner for simple tasks for now, and will only migrate existing boefjes to the new runner if they are already using containers. In a second iteration of this design, we will consider how to run a larger number of tasks efficiently without the overhead of starting a new container for each task. This should allow us to also migrate the remaining boefjes for short tasks to the new runner.

Its this overhead which makes KAT very resource intensive, and this overhead scales lineary with the amount of items you need to scan making KAT itself hard to scale. And since there will be a new design of the boefjes, I think that it makes sense to fix this overhead, by eliminating the need for any part of KAT to do anything with managing docker containers, and just stick to streaming and processing data from queue to queue.

underdarknl commented 1 year ago

Lets discuss this in a meeting.

underdarknl commented 1 year ago

Also, lets keep in mind that starting (or doing a rollback to a known point) of each container for each boefje task is by design, as to make sure that no two jobs could ever influence each other.

praseodym commented 1 year ago

This sounds like a packaging issue to me, mostly. Instead of making a container per tool and a boefje spawning those containers, you could also bundle the tool together with the boefje that uses the tool into a single container. This will give you an atomic container, which will always have the relevant dependencies for that boefje installed and allows the operator to determine how scaling should be done. Running docker-in-docker is, as you mentioned, a huge security risk, and should be prevented as far as possible.

Packaging the tool together with the boefje code is exactly what we are going to do with the new runner. Also note that we are not currently using Docker-in-Docker, but mounting the Docker socket from the host into the boefjes container to run containers on the host. The new runner will give you options to avoid that by running containers on Kubernetes or Nomad.

This is not really a problem. You could have a bunch of Dockerfiles for each boefje, and build + push them all from one codebase. All dependencies that come with the tools the boefje runs are installed + managed via the container, so they dont pollute the repo. Another option is to split out each boefje to its own repository. I also think that, if there would be 1) a simple send + receive api around a boefje, combined with 2) a more easy deployment of the containers (basically, removing the whole docker-in-docker thing) will lead to an ecosystem which will be more flexible. Think building blocks that allow the operator to build a streaming scanning platform.

We haven't exactly decided on the repository structure for boefjes, but distributing boefjes outside the core KAT repository is what will be made possible with the new runner.

Its this overhead which makes KAT very resource intensive, and this overhead scales lineary with the amount of items you need to scan making KAT itself hard to scale. And since there will be a new design of the boefjes, I think that it makes sense to fix this overhead, by eliminating the need for any part of KAT to do anything with managing docker containers, and just stick to streaming and processing data from queue to queue.

We haven't considered managing boefjes outside of KAT because that would mean each boefje needs to be set up manually on the host system or container orchestrator. Additionally, this would mean every boefje always has to have a running container which creates a lot of overhead: more than 50 containers running at all moments if all boefjes are enabled, and even then each boefje container can only run one task at a time which is not realistic for longer tasks like Nmap. Finally, having boefjes as long-running tasks has some security disadvantages because one task could influence a future task.

Having KAT start a container per task doesn't have these disadvantages, in exchange for some overhead per task. As previously described, we will of course be looking to minimise this overhead.

r3boot commented 1 year ago

Ok, so in preparation for maybe a talk about this, I thought Id describe my idea in a drawing (pardon my ascii), using the nmap boefje as an example. Imagine a queue + a bunch of workers per type of boefje/scanner.

+------+
| mula |----------------->(submit message to /to/boefje/nmap)------+-----------------+
+------+                                                           |                 |
  |                                                                V                 V
  ^                                                          +-------------+   +-------------+
  s                                                          | nmap boefje |   | nmap boefje |
  u                                                          |   worker 1  |   |   worker 2  |
  b                                                          +-------------+   +-------------+
  m                                                                 |                 |
  i                                                                 V                 V
  t          +------(submit results to /to/whiskers/normalizer)-----+-----------------+
             |
  f          |
  i          +-------+---------------+
  n                  |               |
  d                  V               V
  i            +------------+  +------------+
  n            | normalizer |  | normalizer |
  g            |  worker 1  |  |  worker 2  |
  s            +------------+  +------------+
  |                  |               |
  |                  V               V
  +------------------+---------------+

All lines are queues that live in (eg) rabbitmq (or kafka if you want to get really fancy). Over these queues, simple messages are communicated, instructing the workers what to do. For instance, this could be something like the following (simplified ofc):

To the boefjes:

{
  "scan_uuid": "<unique id of scan>",
  "ip": "1.2.3.4"
}

To the normalizers

{
  "scan_uuid": "<unique id of scan>",
  "boefje": "nmap",
  "ip": "1.2.3.4",
  "result": [23, 80, 110],
}

To mula

{
  "scan_uuid": "<unique id of scan>",
  "boefje": "nmap",
  "ip": "1.2.3.4",
  "result": [23, 80, 110],
  "new_ooi": [
    {
      "type": "telnet_banner_scan",
      "target": "1.2.3.4:23"
    },
    {
      "type": "http_scan",
      "target": "1.2.3.4:80"
    },
    {
      "type": "pop3_scan",
      "target": "1.2.3.4:110"
    }
  ]
}

Each worker on the queue should only get the least amount of info needed to run. Each worker runs a while true loop, reading an item from the queue, processing this, and submitting the results back on a queue. Eventually, the results come back @ mula, and it should have enough info to correlate all resulting messages. By letting each worker do as less as possible, they can be loosely coupled, and this allows for more rapid development of new boefjes (just slap a bunch of new workers on some queue, let mula write to this queue, done, +-)

The boefjes worker(s) could be either one-boefje-per-container (with a topic per boefje), or many-boefjes-per-container (with either a single topic and a message with the scan type or listeners on a topic per type of scan). Starting all of these workers becomes the responsibility of the underlying OS/orchestrator.

minvws / nl-kat-coordination