Closed jmacdone closed 8 months ago
I forgot to update CHANGELOG.md
Does this related at all to #982 ?
Also some unit test coverage are needed for these changes, even if there hadn't been coverage previously. The new .keyword
logic should be a straightforward unit test since there are already lookup tests in util_test.py
. For the get_hits_terms
changes if you have to split out the logic into its own helper function so that you can isolate the unit testing to just that logic, that is fine, and preferable anyway.
Regarding @nsano-rururu's comment I think he meant "You forgot to update the changelog".
A status update: I've been grinding on this for a while. While trying to write unit tests the current PR, (it feels like) I keep finding things like https://github.com/jertel/elastalert2/blob/0716fd3211a5400397d4bd2ab02b71bc0eb4d501/elastalert/ruletypes.py#L948-L951 and https://github.com/jertel/elastalert2/blob/0716fd3211a5400397d4bd2ab02b71bc0eb4d501/elastalert/elastalert.py#L1326 that assume query_key
will be a string representing a single fieldname, but https://github.com/jertel/elastalert2/blob/0716fd3211a5400397d4bd2ab02b71bc0eb4d501/elastalert/loaders.py#L416 will create a comma-joined string of fieldnames (multiple).
So, I'm scratching my head as to what would be the right fix. I'm leaning towards adding guards to check for compound_query_key
before using query_key
for those two examples (and any others found).
But, I'm also considering "should query_key just always be normalized to a list?" and then remove the "hidden" compound_query_key
created as a side-effect of loading rule options and chase after all the broken that'd create. Probably way more work, but this somehow feels "more correct" at the moment.
Does this related at all to #982 ?
They could very well be related. One of the symptoms of this PR's bug is "man, why does top_events
never seem to work for me?!"
From what I recall query_key
is intentionally not supposed to be a list, since Elasticsearch's aggregations will respond using the comma delimited string as the bucket key for a compound query key, and we need to repeatedly fetch that bucket using the comma delimited form. I believe this is the case, it's been a while since I dove into that area.
Now that I'm looking more closely at these code changes it appears your new split(", ")
will not work correctly in some cases. The rule loader uses join(",")
which will create that compound string without spaces. When I first reviewed this I was thinking split(delim)
treated any character found in delim
as the delimiter string, but that's not the case. Instead it treats the full delim
string as the delimiter. A safer approach would be to check if qk.contains(",")
to determine if this is a compound key or not on line 471 in elastalert.py
. Then later where qk_list[i]
is used, add a .strip()
to that reference to remove any extra whitespace for instances when the query_key value is specified with spaces in the comma delimited format.
That said, I'm now wondering what scenario it was where you found whitespace characters in the query_key
string value in the first place, which led to your change toward split(", ")
. Were you manually specifying a "query_key: key1, key2" in the rule YAML file, instead of using the list format?
what scenario it was where you found whitespace characters in the
query_key
string value in the first place
I've been wondering the same. After more debugger time, I see the get_hits()
method calls the process_hits()
method which adds rules options (like rule['query_key']
) to the _source
dict returned from elastic.
remove any extra whitespace for instances when the query_key value is specified with spaces in the comma delimited format
It looks it ~used to~ tried to work that way ~5 years ago https://github.com/jertel/elastalert2/commit/76ab593aeb57355d7084314804d7e89eb2449157
~I'm not sure how to how find the blame for a deleted line, but I'm curious as to when and why this one was removed~ Ah, this was meant to handle the whitespace stripping, but didn't, because it never used the qk_tmp
value https://github.com/jertel/elastalert2/blob/76ab593aeb57355d7084314804d7e89eb2449157/elastalert/elastalert.py#L442-L443
and because the qk_tmp
was never used, it was stripped in https://github.com/jertel/elastalert2/commit/8abff23afcd75234d2d4c95f99cc630cc57b1c31
leaving us with the old commaspace-split replaced with a comma-split. i.e. https://github.com/jertel/elastalert2/commit/76ab593aeb57355d7084314804d7e89eb2449157 introduced buggy behavior.
Isn't this line https://github.com/jertel/elastalert2/blob/0716fd3211a5400397d4bd2ab02b71bc0eb4d501/elastalert/elastalert.py#L337 comma-space joining the values of the returned query keys? It looks like it is setting that new string as the value for the query_key
key, rather than the key itself.
Confusing, right? Sometimes qk
is a key, and sometimes holds the value of a key. In one of my example hits, https://github.com/jertel/elastalert2/blob/0716fd3211a5400397d4bd2ab02b71bc0eb4d501/elastalert/elastalert.py#L337 results in
hit['_source']['serialnumber.keyword,src_computer.keyword'] = 'NXHFPAA001234567890, LAPTOP-M16RFTYS'
and when it comes time to alert, the alert calls get_hits_terms(..., qk='NXHFPAA001234567890, LAPTOP-M16RFTYS', ...)
which can't be split properly by https://github.com/jertel/elastalert2/blob/0716fd3211a5400397d4bd2ab02b71bc0eb4d501/elastalert/elastalert.py#L466-L469
And this is in the context of a cardinality rule configured with
query_key:
- serialnumber.keyword
- src_computer.keyword
raw_count_keys: false
Yes, that qk
variable needs to be renamed qk_value
to eliminate confusion. Then there's the issue with a pre-joined value itself containing a comma, and later the code thinks it's splitting apart the query key into their respective values, but now the extra embedded commas are going to trip up that split. It could use a better design. Perhaps a quick fix is to join those values with a more explicit delimiter, such as __EA2_DELIM__
and then the join and split statements can share a single constant var. It would help avoid future maintainers and show how those two join and split statements are connected.
Thinking more on this... The drawback to using that explicit delimiter is that is any other code or existing users relying on that field are now going to see that ugly delimiter in their alert message.
If we leave it to the comma delimited approach we could at least do a sanity check after the split and see if the length of the split value array matches the compound_query_key
array. If it doesn't then at least log an error for visibility, or even append a message to the value that the top counts couldn't be accurately determined.
But this isn't even the problem you're trying to solve so apologies for muddying up the issue.
Back to the original code review, there may still be some benefit leaving the split as-is and using a .strip()
on the qk_list[i]
reference.
re-tested with test-docker
I left the commits "organic". But if the project prefers consolidated commits, I can do that.
I'm not seeing any new commits.
:man_facepalming: git did what I asked rather than what I wanted. Better now?
Description
Fixes to allow top_count_keys to still function when using a list of query_key values.
Checklist
make test-docker
with my changes.Questions or Comments
tl;dr fc583f77fb6c3eb134ddd3164c908441c3086898 and e0795fe556a8b9b72aaf305237f8f5c72823ab84 are bug fixes. 9d1f291e24c9ef8d5ae246a281eae5a06cd770f2 is a guess.
I didn't attempt unit tests as a.
get_hits_terms
looks strongly coupled with elasticsearch results and b. there wasn't an existing get_hits_terms unit test (that I could find)Confirmed through manual testing that a single query_key still behaves as before and the list of query_key values now populates topevents* as expected.
I have low confidence 9d1f291e24c9ef8d5ae246a281eae5a06cd770f2 is the correct way to handle
raw_count_keys=false
but it's working (finally) as expected, when dealing with a mix oftext
terms andkeyword
,ip
,int
, etc. terms. As it stands now, with this PR, a rule with this config works as I expect.Manual testing done on a python 3.11 image (similar to the Dockerfile) with vscode debugger checkpoints around the changes
make test-docker