jertel / elastalert2

ElastAlert 2 is a continuation of the original yelp/elastalert project. Pull requests are appreciated!
https://elastalert2.readthedocs.org
Apache License 2.0
914 stars 287 forks source link

Spike aggregation: TypeError: '<' not supported between instances of 'NoneType' and 'int' #1384

Open vaddenz opened 7 months ago

vaddenz commented 7 months ago

1. Exception Log:

ERROR:elastalert:Traceback (most recent call last):
  File "elastalert2/elastalert/elastalert.py", line 1260, in handle_rule_execution
    num_matches = self.run_rule(rule, endtime, rule.get('initial_starttime'))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "elastalert2/elastalert/elastalert.py", line 883, in run_rule
    if not self.run_query(rule, tmp_endtime, endtime):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "elastalert2/elastalert/elastalert.py", line 637, in run_query
    rule_inst.add_aggregation_data(data)
  File "elastalert2/elastalert/ruletypes.py", line 1202, in add_aggregation_data
    self.unwrap_term_buckets(timestamp, payload_data['bucket_aggs'])
  File "elastalert2/elastalert/ruletypes.py", line 1236, in unwrap_term_buckets
    self.handle_event(event, agg_value, qk_str)
  File "elastalert2/elastalert/ruletypes.py", line 500, in handle_event
    if self.find_matches(ref, cur):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "elastalert2/elastalert/ruletypes.py", line 528, in find_matches
    ref < self.rules.get('threshold_ref', 0)):
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'

2. Reproduction

Start elastalert with the following index pattern & rule config, when first time frame is reached, the exception above occurs.

2.1 ElasticSearch Index Pattern

{
  "metrics": {
    "name": "request.timer",
    "value": {
      "timer": {
        "histogram": {
          "p95": 796
        }
      }
    },
    "labels": {
      "appId": "<appId>"
    }
  },
  "@timestamp": "2024-02-28T02:10:30.179346Z"
}

2.2 Rule Configuration

name: 'Request Timer Alert'
description: '3 Minute Request Timer Alert'
type: 'spike_aggregation'
is_enabled: true
timeframe:
  minutes: 3
buffer_time:
  minutes: 3
search_extra_index: true

# Spike
index: '<index>-*'
metric_agg_key: 'metrics.value.timer.histogram.p95'
metric_agg_type: 'avg'
spike_height: 1.1
spike_type: 'up'
query_key: 'metrics.labels.appId.keyword'
alert_on_new_data: true
filter:
  - term:
      'metrics.name.keyword': 'request.timer'

# ElasticSearch cluster config
es_host: '<host>'

# Alert configs
alert:
  - 'post2'
http_post2_url: '<url>'
http_post2_all_values: true

3. Investigation

The exception is caused by ruletypes.py/SpikeRule.find_matches(ref, cur), in which ref & cur may be None but not validated. Therefore this exception can be fixed as follow:

class SpikeRule(RuleType):
    ...
    def find_matches(self, ref, cur):
        if ref is None or cur is None:
            return False
        ...
github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 6 months with no activity. Stale issues convey that the issue, while important to someone, is not critical enough for the author, or other community members to work on, sponsor, or otherwise shepherd the issue through to a resolution.