kolide / fleet

A flexible control server for osquery fleets
https://kolide.com/fleet
MIT License
1.1k stars 263 forks source link

Deleting / modifying built-in labels can break ad-hoc querying feature #2138

Open nyanshak opened 4 years ago

nyanshak commented 4 years ago

If I do either of the following things, ad-hoc querying will not work:

Reproduction Case 0

labels.yml:

---
apiVersion: v1
kind: label
spec:
  name: All Hosts
  description: All hosts which have enrolled in Fleet
  query: SELECT 1;
  label_type: 0

fleetctl apply -f labels.yml

Now the ad-hoc querying is broken, described further in #Explanation below.

Fix: change label_type to 1, then fleetctl apply -f labels.yml

Reproduction Case 1

labels.yml:

---
apiVersion: v1
kind: label
spec:
  name: All Hosts
  description: All hosts which have enrolled in Fleet
  query: SELECT 1;
  label_type: 1

fleetctl delete -f labels.yml

Now the ad-hoc querying is broken, described further in #Explanation below.

Fix: fleetctl apply -f labels.yml

Version of fleet

2.3.0

Explanation

If either of the modifications at the top happens, then the ad-hoc querying page is broken.

Clicking to pick hosts to target won't return any results, and the console will show 500s from the /labels endpoint.

Possible fixes

nyanshak commented 3 years ago

(just going to try to gather thoughts about this)

I was trying to think about how best to do this today. The 'All Hosts' label is scattered in various places in the code and 'expected' to exist. At minimum I think it should not be possible to delete the 'All Hosts' label (as doing so will break ad-hoc querying entirely).

The other builtin labels seem less important and probably "can" be deleted without breaking things in the UI.

Even the "All Hosts" label can be modified as long as it's not deleted.

Originally I said we might be able to "make the API / UI resilient to missing built-in labels", but I don't think that's a straightforward option, just because "All Hosts" label is scatter across so many places.

Having looked at more of the code, I'd suggest these options:

In both options, I'd suggest returning 400 Bad Request if user tries to do one of the not-allowed actions (any mutation for Option 1, or deleting in Option 2).

@zwass - do you have any thoughts on an approach / preferred option for this? I'm considering working on this in the near future, but want to work out an agreeable approach first.

zwass commented 3 years ago

I'd go with Option 1. Modifying the labels in any way could lead to unexpected problems so I like the idea of being more conservative.