minvws / nl-kat-coordination

Repo nl-kat-coordination for minvws
European Union Public License 1.2
122 stars 55 forks source link

Scopes / Networks support #273

Open underdarknl opened 1 year ago

underdarknl commented 1 year ago

Currently, a boefje runner is considered to be al-reaching. This is obviously not true, and should be a consideration for the scheduler to give it certain jobs.

We already have a Network object which we can differentiate on. An IP or hostname is (loosely) coupled to a network, however multiple networks can exists. The boefje job runner should have a configuration setting which the admin can set to include which networks this runner can reach. Eg, most will be able to access the Network Internet, but not all can reach certain local networks.

Objects in these networks should produce jobs bound to the network-scope in which they are reachable. Jobrunners should then list the network-scopes in their query to the scheduler to ask for jobs that are to be executed in their scope, and should not receive jobs that they cannot execute because they cannot reach this scope.

jpbruinsslot commented 1 year ago

@Donnype want to discuss this a bit further?

Donnype commented 1 year ago

@underdarknl @jpbruinsslot Thinking about this I had some thoughts about the potential complexity here. If we focus on the use-case "local network boefje" and assume scoping only applies to objects on a network, we of course want to avoid the following situation:

Design-Scopes

This indeed means that "objects in these networks should produce jobs bound to the network-scope", and this scope in the case of networks could be derived by looking at the specific network the objects refers to. But I can also imagine that certain objects from a scoped job are not necessarily "scoped", like weburls found in (private) html pages. Perhaps we need to differentiate on that (in the future)? And: are there "scoped" objects that do not point to a network that we also want to support? Because that would require keeping track of scopes separately either on OOI level or job-level. If we also want to support scoping in the sense that certain runners have particular permissions (perhaps through IAM settings within your cloud provider), looking at the network also wouldn't be sufficient.

But perhaps we can just focus on the network for now and only perform the check on network reference? I can add the configuration to the boefje runner environments and pass it as a query parameter to mula.

jpbruinsslot commented 1 year ago

From the scheduler side in order to filter on scope, this scope needs to be available in a BoefjeTask (the actual task that is being popped from the queue by a runner). The information in that BoefjeTask can then be filtered.

  1. Add the whole ooi to the boefje task, such that the fields within a ooi object can be filtered on. However this scope/network should then be present in the ooi.
  2. Add a new field network or scope to the BoefjeTask, which gets set when the task is being created and a query is sent out to octopoes to gather the network/scope this object has.

I'm gravitating towards option 2, the job runner can specify with a normal request with the additional filter options, in this case the scope, and the item with those parameters and highest priority gets popped off.

underdarknl commented 1 year ago

@Donnype agreed, that situation should be avoided. My thinking is that the 'port|80' on the 'network|internet' is at least bound / referenced to an IP on the 'network|Internet', and as such creates jobs via that IP, which we know is 'network|Internet' bound. Unless we have a boefje which ingests just the port 80 part, but I'm not sure how that would be something that needs to be scoped to any network as its not something you can scan on its own. (it could be a wikipedia lookup to gather info on port 80 in general), in which case the boefje itself should just be scoped to need Internet Access regardless of the objects it consumed I'd guess.

Donnype commented 1 year ago

@underdarknl I suppose that means that the rules can be quite complex and/or specific? Perhaps we somehow have to allow users to define rules for workers? For example:

For "regular" workers:

scope:
  boefjes: "*"

For workers with access to local ip 192.168.1.1 of "my-company-network" and "my-other-network", but no internet access:

scope:
  boefjes:
    - nmap
    - nmap-ip-range
    - nmap-top-250
  exclude_networks: "Network|internet"
  include_networks:  # should check if the input oois stem from one of these root networks, default is "Network|internet"
    - "Network|my-company-network"
    - "Network|my-other-network"

And boefjes manifests have similar fields:

boefje:
  id: kat-dns
  [...]
  include_networks:
    - "Network|internet"

Regarding my observation:

"But I can also imagine that certain objects from a scoped job are not necessarily "scoped", like weburls found in (private) html pages."

I think the boefje should add such objects to internet explicitly (when they are rather sure).


Other options as discussed above involve adding a more generic "scope" tag and not allowing objects to be used as input on workers without the scope tag that was present (or set) on their creation.

I think adding fields on boefjes, boefje_tasks and OOIs is quite intrusive for the current lack of understanding/knowledge regarding such a feature. If we focus on the local-network use case, we could probably add a first version of such a feature by simple explicitely setting a network scope setting (just an env var) that contains the name(s) of the network a worker has access to, so that we can pass such a filter to the scheduler, after which the scheduler returns only tasks for OOIs belonging to those networks? That does add some significant bookkeeping to the scheduler, but it would not spill towards the boefjes and boefje_meta objects (too) much.

underdarknl commented 8 months ago

In the real world there's the following: Objects in any Scope have a few ways to get to their intended target, Hostnames use the local DNS server to be resolved into IP addresses by using the hostfile in a server or container, the local dns, and if that fails the system dns resolver. Ip addresses themselves do similar things, they are routed against any 'local' routes first, and if that fails (eg, there is no specific route), the packets are forwarded to the route designated as the Gateway.

In the example of a web-url (eg a link), found on the webpage served by a local printer, inside the local Lan, we should probably argue that its a local url first (as the local DNS in that scope might resolve it to a local server), and if that fails (which will be transparent to the boefje), it will resolve to the Outside world. The trick for us is to determine which path was taken, and if that path signals us we have left scope. Eg, if we see a new url on the admin page of a printer, we can assume its in the given network and scope. but I'd reckon we can allow ourselves the freedom to label the url connected to the network and scope 'network|internet' if we see a public ipv4 address being resolved. Ideally, we'd know this before we do any resolving as to make sure we don't burden the local job runner with 'internet jobs', but, this resolving action is something that is bound to the original scope, as it is in fact influenced by it. Since this url on this printer page, might resolve to a different host, maybe someone of mitm-ing clients and sending them malware, it might even be a good idea to keep this link inside this local scope, eventhough we might have also found it on the network|internet.