What do we want to do with packs

directionless commented 5 years ago

In 2019-09-17 office hours we talked a little about whether we should be shipping packs. In office hours, there's a strong bias to drop them. We don't really vet or maintain them.

But chatting on macadmins, I hear strong interest. Summarizing some things:

shipping packs is batteries included

Osquery at previous Org would have not been successful without the packs. They were the launchpad for ideas and helped show what osquery was capable of. It was a firehouse that we had to tune but it wasn’t too hard to determine what was the outliers in the data. There probably a bit too much overlap in the packs for people just looking at it and that could be overwhelming for beginners. There’s very little documentation around why some of the queries work and what false positives they have. An example of a confusing osquery issue is the joining on userid. Nobody is gonna know why that’s important or even there since it’s documented in a GitHub issue. An example of a good query is the reverse shell query it has a ton of info online about false positives and why it works but that’s not gonna help someone who is just starting with osquery and is scared :scream: they are owned.

the first is that “onboarding” is very important for any open source project

the “default packs” that ship with the project should be “model” packs—they should teach the end user of osquery (security team) what “good queries” look like.

i think people don’t even realize this is a problem that needs to be solved. Newcomers assuming the packs have high value and conflating osquery with the “maintained” packs

The framing between query sharing, and packs isn't clear to everyone. These may be different, or the same.

Relates to:

groob commented 5 years ago

I need to take some time to formulate a longer response, but overall I see two separate issues/requests and want to make sure we acknowledge them in this discussion:

Issue 1: osquery is hard and we should invest in better educational resources.

Issue 2: packs are "intel" and users can run the packs that are packaged with osquery as a way of ensuring the security of their fleet. packs are maintained by some set of quality standards.

Now for some opinions: IMO it's pretty clear that we could be doing a much better job of improving our educational resources. This is achievable, especially now that we have a mature community with people that have years of experience using osquery. We just need to put it down in writing and iterate on the presentation.

The second issue I'm personally somewhat pessimistic about, companies are not willing to open source the intel they use in general. But if we were to attempt to solve this problem, we should do so in a separate repository. This has the benefit that the repo can be maintained by a different set of people than the core and that certain guarantees can be provided(freshness, performance, accuracy").

reedloden commented 5 years ago

Thank you for starting this discussion! I was quite surprised to find out that osquery maintainers considered the built-in packs as pure examples and not something that should be used at all.

I think one of osquery's biggest strengths has been the built-in packs that add immediate value when deployed. I've used osquery for several different fleets, and I almost always find out something fairly quickly based on the data coming in from those packs.

Separating out the packs seems like a great idea, as it would allow more focus on them and garner a larger community.

A few things come to mind:

Start with the current pack contents (it's worked for many companies for years, so it's better than nothing).
Build automation around testing queries for performance as part of any pull request.
Move over any pack-related issue/PR from main osquery repo to this new repo.
Automate update to several packs that are just repackaged versions of other data formats (thinking: ossec-rootkit.conf, chrome-extensions.conf, etc.), allowing for continuous updates.
Figure out how users will utilize these curated packs in their fleet (preferably supporting some type of automated update mechanism).

groob commented 5 years ago

not something that should be used at all

To clarify my position, I don't think they should NOT be used. I was pointing out that they don't come with specific guarantees and that it would be best to use them as a reference, not something to run "as is".

It's a "here's some best practices about this thing osquery ships".

barn commented 5 years ago

fun! (:

So to give an analogy, if you've ever used Snort (or any IDS) there's two modes. There's the one where you excitably install absolutely every rule because why wouldn't you? you want all the information. You now are in alert hell and ultimately ignore the thing, because trying to us it is hell.

Or you turn everything off, and pick and choose like a dozen/a few dozen rules that you really know they what do, and when they go off, you know pretty much what that means.

I don't think osquery's packs are as bad as the former, but as they grow, they move away from the latter.

I don't think companies have to give up intel to share rules in quite that way. It's more what is the motive behind this rule, what is it telling you when it goes off.

I strongly agree with @reedloden that packs are set once and never update, which is a shame. Finding a mechanism for that, and allowing local comments and overrides is harder than just git update fair.

groob commented 5 years ago

I strongly agree with @reedloden that packs are set once and never update, which is a shame. Finding a mechanism for that, and allowing local comments and overrides is harder than just git update fair.

I'd categorize that as yet another problem that exists but is a separate discussion thread. Currently this is handled by third party solutions like Fleet and also happens to have many footguns.

Maintaining a query pack means occasionally adding/removing fields. Osquery tables also sometimes rename fields. How do you do version control for tables? For queries? How do you introduce a breaking change to a list of users that depend on the query results?

clong commented 5 years ago

In my experience, packs have been somewhat poorly maintained. It's quite a bit of work to add new queries as new tables get released and update columns when tables change. No one is "responsible" for packs.

I also worry that new osquery users view packs as a checklist instead of cherry picking which data sources are available to them and tailoring their needs based on that. I personally think that packs should serve as examples but it should be noted that they may not be kept up to date and they don't necessarily reflect an ideal deployment of osquery. People should absolutely NOT deploy osquery by just enabling the packs and calling it a day (in my opinion).

barn commented 5 years ago

There's a difference between "people should not deploy osquery by just enabling the packs and calling it a day" and what actually happens in the real world.

I don't think anyone in this thread is the target audience for "what is the default, assuming I just install this and leave it" but I would imagine that's a sizeable use case . (but I have no numbers, etc...)

How often do the rules get updated? how often are new ones added? vs new packs? (sorry it's been a while since I've had to run osquery over a fleet)

I wouldn't want to spawn another arachnids email list, partly because it's 2019, partly because I'm no good at credit card fraud, but mostly because then someone has to steward that canonical resource. Should that be the job of osquery? facebook? the community?

dons fancy jacket of spitballing 2d4

github and pull requests seem a really natural place for that to me. With either a release happening at some point, or just running HEAD if you're a wild person.

Having a framework to test them would make releases possible. Namespacing (pack name - date/sha) could make updating possible without blowing away the rules you have. In the kolide-fleet world, you always do something like "if you fork this rule, it's no longer in that pack" or some magic.

I'm probably getting way off track. 😇

packetzero commented 5 years ago

I feel that we should remove packs from the osquery repo ASAP.

They are not threat intelligence. With names like osx_attacks , etc. gives people a false sense of security.
We are too small a group to spend time on them.
- Many of them are poorly designed for SQL performance. People can fetch them out of the old branches if they want them. Some enterprising soul or company might start to maintain something that's better than the dusty packs we currently have.

infosecmel commented 5 years ago

I agree with this statement "They are not threat intelligence. With names like osx_attacks , etc. gives people a false sense of security."

a-zndr commented 5 years ago

As a person who helped bring up this debate in the worlds longest slack thread, I think many of you are perhaps unaware of how many people are being introduced and or deploying osquery today. Tools like Kolide and Fleetsmith push out osquery including the packs. It's great to have this data ready to go, and ready to use, but with a heavy heart I read this conversation and see that these packs are almost a joke.

The lack of documentation around the point of the packs is definitely an issue, and probably one of the reasons for this conversation.

Packs shouldn't be the only way you interact with osquery, but are frequently a great place to get started. The analogy to Snort is a good one. But osquery is far from that bad today.

I think dropping the packs because you're afraid of the monster you made is the wrong way to go. Asking the village to help raise the monster is probably better.

barn commented 5 years ago

I don't feel packs are a joke, just under loved and perhaps not being used to their full potential?

Sharing things like this has always been hard though, and many companies will be reluctant to share their rules publicly for good and bad reasons. (whole different thread there... wheeeeee)

I really don't think osquery has this as badly as snort does, thankfully, but also rules are ever so slightly less sharable than snort rules, so perhaps that is partly it too. I do worry about the mac rules for malware/attacks that had its day in like 2012 (or w/e) but still is in the default packs.

I haven't actually sat down and used osquery in like ten months, so I'd be overstepping to say I know what is going on or the how the community feels.

groob commented 5 years ago

@a-zndr I am aware of how many people use the default packs and it's been something we've occasionally discussed in office hours and at conferences here and there. When we created Fleet at Kolide and open sourced it, the immediate request for everyone was "how do I run the default packs" which made me feel icky about the perception of what they are on our side vs what new users thought of them...

Anyway, I think "community supported" query sharing is a cool idea that I'd love to see happen and if people feel strongly about there being value in packs, then we should spin them out in a separate repo, with maintainers who care about query quality and all.

reedloden commented 5 years ago

if people feel strongly about there being value in packs, then we should spin them out in a separate repo, with maintainers who care about query quality and all.

Can we do that in a methodical way that doesn't leave folks hanging? The calls to rip them out ASAP is what is concerning. I have no objections to splitting them out into a separate repo/project (in fact, I think it's a great idea, as releases shouldn't constrain query pack updates), but just want to make sure there's a migration plan/path for folks and well-communicated so people can prepare.

directionless commented 5 years ago

I don't think we're going to move very fast. This has been an ongoing issue for awhile, and I think this thread is likely to go for a bit.

One thing that jumps out at me, is that a lot of people use the default packs. They represent a batteries included approach. And that fixing them up is a way to positively help a large number of end users. That seems like something worth investing in.

packetzero commented 5 years ago

"batteries included approach". I disagree with this characterization. It should be made clear that the default packs provide nothing more than a 'Quick Start Guide' set of example queries. Queries that after you have them up and running, should be replaced with your own curated set that gets updated periodically.

We are talking about a whole new project here : actively maintaining a valuable base set of queries that can help the community. A free "lite" version of threat-intel queries ... anyone with the time and expertise to do this is a hero.

Some other considerations for each query:

What is query's purpose ? Link to IOCs, CVEs, research it's based on?
What is performance load (memory, CPU, IO) for Query X on each target (laptop, web server, Active Directory / Domain Controller servers).
It's impossible to make a one-size-fits-all pack. Some queries too heavy to run on a server, and some intervals need adjusting.
List of each version of osquery the query has been run on, and the noted differences in results / behavior.
List of each target operating system version tested and differences in results / behavior.
Are we testing both the positive and negative cases of the query?
As a community, we don't have telemetry on which queries are providing value or causing problems. I would love to see this.
The public default schema does not reflect actual use. Many vendors use their own tables via extensions to workaround known issues and limitations.
When new tables are introduced, have to mark 'minimum version' of osquery that can run them. What about the old versions of osquery, they have to use older tables, do we even have this versioning yet?
I have seen some people and vendors label their queries with Mitre ATT&CK metadata. Which can be good and bad. Good only if the query returns results only when it's an indication of compromise. And good if it's not too verbose... don't need tactic and technique descriptions to bloat the TLS config that gets pulled every 5 minutes.

So should we spend time to fix these up, and continue to maintain them, test them, and add more? Unfortunately, not for the foreseeable future. Let's focus, there's a lot to do on core osquery.

osquery / foundation

What do we want to do with packs #28