mozilla / mig

Distributed & real time digital forensics at the speed of the cloud
http://mig.mozilla.org/
Mozilla Public License 2.0
1.21k stars 234 forks source link

[mig] Flexible search endpoint (Bugzilla #1134390) #112

Open jvehent opened 9 years ago

jvehent commented 9 years ago

Migrated from https://bugzilla.mozilla.org/show_bug.cgi?id=1134390 Assigned to: Julien Vehent [:ulfr]

On 2015-02-18 13:53:27 -0800, Julien Vehent [:ulfr] wrote:

MIG provides a search API endpoint documented at http://mig.mozilla.org/doc/api.rst.html#get-root-search It is currently very limited in functionality. It can search for action, commands, agents and investigators, but it only supports searching on a subset of fields, and each search query results in a very inefficient JOIN of all database tables that is slow to process.

The search API needs a revamp. Here's a few requirements:

  • be fast. JOINs should only be perform when needed. Unlike now: https://github.com/mozilla/mig/blob/master/src/mig/database/searches.go#L90-L93
  • be flexible. The API should support searching inside of the json fields stores in postgres. I would like to avoid statically listing all supported JSON fields, but instead have the API try and fail with a meaningful error message when a given search field is not found.

    example: searching for an agent using its IP address, which is stored in a JSON array inside of the agent.environment column

  • allow for complex queries. For example, list agents that ran an action of threat family "malware" launched by investigator named "julien vehent" over the last 20 days. If possible, I would like to do this without accepting raw SQL in API parameters, without statically defining all possible search parameters in the code and with decent performances.

    The current code, that statically lists search parameters is here https://github.com/mozilla/mig/blob/master/src/mig/api/search.go#L46-L98

  • control the data returned. Right now, a lot of unnecessary data is returned by the search API, because it has no way to define which fields the requester wants.

So, in fact, we really want SQL flexibility, but in API queries, and without the risk of taking raw sql as input.

In parallel to this work, the client library should be updated with a flexible search syntax for the command line. https://github.com/mozilla/mig/blob/master/src/mig/client/client.go

jvehent commented 9 years ago

Part of that work has landed in #87 but it's still quite slow and doesn't allow for random queries. More to come later.