Closed micrictor closed 3 years ago
Ack on all accounts - debug print is an obvious oopsie. I’ll try to figure out a... less bad way to decompose the query data. I contemplated making a custom object just to represent this relationship, but that felt like it would be taking away from the readability of the test.
I see various python2 interop stuff throughout the code base, so I’ll use
format
in case there’s someone still relying on that.
On Fri, Apr 9, 2021 at 4:47 AM Max Vogler @.***> wrote:
@.**** approved this pull request.
Thank you for this amazing pull request! I'm pre-approving, because there are only minor code health changes to be made and I am out of office next week. One of my teammates can merge next week or I can merge the week after when I'm back.
In grr/server/grr_response_server/output_plugins/elasticsearch_plugin_test.py https://github.com/google/grr/pull/923#discussion_r610555143:
- patcher = mock.patch.object(requests, 'post')
- with patcher as patched:
- plugin.ProcessResponses(plugin_state, messages)
- plugin.Flush(plugin_state)
- plugin.UpdateState(plugin_state)
- return patched
- def _ParseEvents(self, patched):
- request = patched.call_args[KWARGS]['data']
Elasticsearch bulk requests are line-deliminated pairs, where the first
line is the index command and the second is the actual document to index
- split_request = request.split('\n')
- print(split_request)
Please remove the print() debugging statement.
In grr/server/grr_response_server/output_plugins/elasticsearch_plugin_test.py https://github.com/google/grr/pull/923#discussion_r610555874:
- plugin.ProcessResponses(plugin_state, messages)
- plugin.Flush(plugin_state)
- plugin.UpdateState(plugin_state)
- return patched
- def _ParseEvents(self, patched):
- request = patched.call_args[KWARGS]['data']
Elasticsearch bulk requests are line-deliminated pairs, where the first
line is the index command and the second is the actual document to index
- split_request = request.split('\n')
- print(split_request)
- update_pairs = [(split_request[i], split_request[i + 1])
- for i in range(0, len(split_request), 2)]
- parsed_pairs = [(json.Parse(i[0]), json.Parse(i[1])) for i in update_pairs]
Instead of deconstructing the tuple, parsing the elements, and reconstructing a tuple, could you call json.Parse on the whole list in split_requests? This might simplify the code a little, e.g.:
split_request = [json.Parse(line) for line in request.split("\n")]
In grr/server/grr_response_server/output_plugins/elasticsearch_plugin.py https://github.com/google/grr/pull/923#discussion_r610559497:
- def _SendEvents(self, events: List[JsonDict]) -> None:
- """Uses the Elasticsearch bulk API to index all the events in a single
- request.
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/docs-bulk.html
- """
- if self._token:
- headers = {"Authorization": "Basic {}".format(self._token)}
- else:
- headers = {}
- index_command = json.Dump({"index": {"_index": self._index}}, indent=None)
Each index operation is two lines, the first defining the index settings,
the second is the actual document to be indexed
- data = "\n".join(
- index_command + "\n" + json.Dump(event, indent=None)
Please use string formatting (f-strings or format()) instead of manual string concatenation, e.g.:
data = "\n".join( f"{index_command}\n{json.Dump(event, indent=None)}" for event in events)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/grr/pull/923#pullrequestreview-632315856, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTCAGLTDMXI32KB3P2S36DTH3SM5ANCNFSM42UGAN2Q .
Pushed in the updates discussed above - mostly removing python2 compatibility stuff, adding docs for the new indent
parameter, and other minor formatting fixes.
As noted above, I didn't port the tests to use responses
, and don't really intend on doing so. If that's blocking, this PR will likely stay open for quite some time before I circle back to that (or someone else picks it up)
I suggest merging without the responses
test refactoring in order to merge this functionality. The test is very close to splunk_plugin_test.py, which uses the same mocking. In the future, one change can refactor both tests. @panhania WDYT?
Sounds reasonable to me.
Thanks a lot for your contribution @micrictor, well done! If you like, send me a PR that adds your name and email to ACKNOWLEDGEMENTS.
Creates an Elasticsearch output plugin using the
_bulk
API.In the config file, the GRR Admin sets:
When executing the flow, the operator can specify:
Note that, as a collateral change, I had to add an optional
indent
parameter to thejson.Dump
method to avoid messing up the line-delimited nature of the Elasticsearch API.This resolves #374