google / grr

GRR Rapid Response: remote live forensics for incident response
https://grr-doc.readthedocs.io/
Apache License 2.0
4.75k stars 762 forks source link

Add Elasticsearch output plugin (#374) #923

Closed micrictor closed 3 years ago

micrictor commented 3 years ago

Creates an Elasticsearch output plugin using the _bulk API.

In the config file, the GRR Admin sets:

When executing the flow, the operator can specify:

Note that, as a collateral change, I had to add an optional indent parameter to the json.Dump method to avoid messing up the line-delimited nature of the Elasticsearch API.

This resolves #374

micrictor commented 3 years ago

Ack on all accounts - debug print is an obvious oopsie. I’ll try to figure out a... less bad way to decompose the query data. I contemplated making a custom object just to represent this relationship, but that felt like it would be taking away from the readability of the test.

I see various python2 interop stuff throughout the code base, so I’ll use format in case there’s someone still relying on that.

On Fri, Apr 9, 2021 at 4:47 AM Max Vogler @.***> wrote:

@.**** approved this pull request.

Thank you for this amazing pull request! I'm pre-approving, because there are only minor code health changes to be made and I am out of office next week. One of my teammates can merge next week or I can merge the week after when I'm back.

In grr/server/grr_response_server/output_plugins/elasticsearch_plugin_test.py https://github.com/google/grr/pull/923#discussion_r610555143:

  • patcher = mock.patch.object(requests, 'post')
  • with patcher as patched:
  • plugin.ProcessResponses(plugin_state, messages)
  • plugin.Flush(plugin_state)
  • plugin.UpdateState(plugin_state)
  • return patched
  • def _ParseEvents(self, patched):
  • request = patched.call_args[KWARGS]['data']
  • Elasticsearch bulk requests are line-deliminated pairs, where the first

  • line is the index command and the second is the actual document to index

  • split_request = request.split('\n')
  • print(split_request)

Please remove the print() debugging statement.

In grr/server/grr_response_server/output_plugins/elasticsearch_plugin_test.py https://github.com/google/grr/pull/923#discussion_r610555874:

  • plugin.ProcessResponses(plugin_state, messages)
  • plugin.Flush(plugin_state)
  • plugin.UpdateState(plugin_state)
  • return patched
  • def _ParseEvents(self, patched):
  • request = patched.call_args[KWARGS]['data']
  • Elasticsearch bulk requests are line-deliminated pairs, where the first

  • line is the index command and the second is the actual document to index

  • split_request = request.split('\n')
  • print(split_request)
  • update_pairs = [(split_request[i], split_request[i + 1])
  • for i in range(0, len(split_request), 2)]
  • parsed_pairs = [(json.Parse(i[0]), json.Parse(i[1])) for i in update_pairs]

Instead of deconstructing the tuple, parsing the elements, and reconstructing a tuple, could you call json.Parse on the whole list in split_requests? This might simplify the code a little, e.g.:

split_request = [json.Parse(line) for line in request.split("\n")]


In grr/server/grr_response_server/output_plugins/elasticsearch_plugin.py https://github.com/google/grr/pull/923#discussion_r610559497:

  • def _SendEvents(self, events: List[JsonDict]) -> None:
  • """Uses the Elasticsearch bulk API to index all the events in a single
  • request.
  • https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docs-bulk.html
  • """
  • if self._token:
  • headers = {"Authorization": "Basic {}".format(self._token)}
  • else:
  • headers = {}
  • index_command = json.Dump({"index": {"_index": self._index}}, indent=None)
  • Each index operation is two lines, the first defining the index settings,

  • the second is the actual document to be indexed

  • data = "\n".join(
  • index_command + "\n" + json.Dump(event, indent=None)

Please use string formatting (f-strings or format()) instead of manual string concatenation, e.g.:

data = "\n".join(
  f"{index_command}\n{json.Dump(event, indent=None)}"
  for event in events)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/grr/pull/923#pullrequestreview-632315856, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTCAGLTDMXI32KB3P2S36DTH3SM5ANCNFSM42UGAN2Q .

micrictor commented 3 years ago

Pushed in the updates discussed above - mostly removing python2 compatibility stuff, adding docs for the new indent parameter, and other minor formatting fixes.

As noted above, I didn't port the tests to use responses, and don't really intend on doing so. If that's blocking, this PR will likely stay open for quite some time before I circle back to that (or someone else picks it up)

max-vogler commented 3 years ago

I suggest merging without the responses test refactoring in order to merge this functionality. The test is very close to splunk_plugin_test.py, which uses the same mocking. In the future, one change can refactor both tests. @panhania WDYT?

panhania commented 3 years ago

Sounds reasonable to me.

max-vogler commented 3 years ago

Thanks a lot for your contribution @micrictor, well done! If you like, send me a PR that adds your name and email to ACKNOWLEDGEMENTS.