jertel / elastalert2

ElastAlert 2 is a continuation of the original yelp/elastalert project. Pull requests are appreciated!
https://elastalert2.readthedocs.org
Apache License 2.0
902 stars 284 forks source link

Support Elasticsearch 8 #92

Closed jertel closed 2 years ago

jertel commented 3 years ago

Elasticsearch v8 no longer supports doc_type. There is likely going to be some effort need to update this project to deal with this.

nsano-rururu commented 3 years ago

ES8 support may need to be considered in addition to doc_type.

In ES7.X this include_type_name default has been changed to false, but elastalert explicitly specifies include_type_name = true to use the type, but it will be removed in ES8. Will be.

Also, the specification of dateOptionalTime is abolished and becomesdate_optional_time.

elasticsearch>=7.0.0,<8.0.0

    ProcessController:  /home/node/.local/lib/python3.8/site-packages/elasticsearch/connection/base.py:193: ElasticsearchDeprecationWarning: Camel case format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead.
      warnings.warn(message, category=ElasticsearchDeprecationWarning)
    /home/node/.local/lib/python3.8/site-packages/elasticsearch/connection/base.py:193: ElasticsearchDeprecationWarning: [types removal] Using include_type_name in put mapping requests is deprecated. The parameter will be removed in the next major version.
      warnings.warn(message, category=ElasticsearchDeprecationWarning)

The following are expected changes that need to be addressed.

___init___.py

add

    def is_atleasteight(self):
        """
        Returns True when the Elasticsearch server version >= 8
        """
        return int(self.es_version.split(".")[0]) >= 8

elastalert/create_index.py

before

   if is_atleastseven(esversion):
        # TODO remove doc_type completely when elasicsearch client allows doc_type=None
        # doc_type is a deprecated feature and will be completely removed in Elasicsearch 8
        es_client.indices.put_mapping(index=ea_index, doc_type='_doc',
                                      body=es_index_mappings['elastalert'], include_type_name=True)
        es_client.indices.put_mapping(index=ea_index + '_status', doc_type='_doc',
                                      body=es_index_mappings['elastalert_status'], include_type_name=True)
        es_client.indices.put_mapping(index=ea_index + '_silence', doc_type='_doc',
                                      body=es_index_mappings['silence'], include_type_name=True)
        es_client.indices.put_mapping(index=ea_index + '_error', doc_type='_doc',
                                      body=es_index_mappings['elastalert_error'], include_type_name=True)
        es_client.indices.put_mapping(index=ea_index + '_past', doc_type='_doc',
                                      body=es_index_mappings['past_elastalert'], include_type_name=True)

after

   if is_atleasteight(esversion):
        es_client.indices.put_mapping(index=ea_index,
                                      body=es_index_mappings['elastalert'])
        es_client.indices.put_mapping(index=ea_index + '_status'
                                      body=es_index_mappings['elastalert_status'])
        es_client.indices.put_mapping(index=ea_index + '_silence',
                                      body=es_index_mappings['silence'])
        es_client.indices.put_mapping(index=ea_index + '_error',
                                      body=es_index_mappings['elastalert_error'])
        es_client.indices.put_mapping(index=ea_index + '_past',
                                      body=es_index_mappings['past_elastalert'])
   elif is_atleastseven(esversion):
        es_client.indices.put_mapping(index=ea_index, doc_type='_doc',
                                      body=es_index_mappings['elastalert'])
        es_client.indices.put_mapping(index=ea_index + '_status', doc_type='_doc',
                                      body=es_index_mappings['elastalert_status'])
        es_client.indices.put_mapping(index=ea_index + '_silence', doc_type='_doc',
                                      body=es_index_mappings['silence'])
        es_client.indices.put_mapping(index=ea_index + '_error', doc_type='_doc',
                                      body=es_index_mappings['elastalert_error'])
        es_client.indices.put_mapping(index=ea_index + '_past', doc_type='_doc',
                                      body=es_index_mappings['past_elastalert'])

elastalert/loaders.py

It is necessary to prevent the following checks from being performed on elasticsearch 8 or later versions

        if rule.get('use_count_query') or rule.get('use_terms_query'):
            if 'doc_type' not in rule:
                raise EAException('doc_type must be specified.')

elastalert/elastalert.py

before

        # Record doc_type for use in get_top_counts
        if 'doc_type' not in rule and len(hits):
          rule['doc_type'] = hits[0]['_type']

after

        # Record doc_type for use in get_top_counts
        if not self.thread_data.current_es.is_atleasteight():
            if 'doc_type' not in rule and len(hits):
                rule['doc_type'] = hits[0]['_type']

before

            if not rule['five']:
                res = self.thread_data.current_es.deprecated_search(
                    index=index,
                    doc_type=rule['doc_type'],
                    body=query,
                    search_type='count',
                    ignore_unavailable=True
                )
            else:
                res = self.thread_data.current_es.deprecated_search(index=index, doc_type=rule['doc_type'],
                                                                    body=query, size=0, ignore_unavailable=True)

after

            if not rule['five']:
                if self.thread_data.current_es.is_atleasteight():
                    res = self.thread_data.current_es.deprecated_search(
                        index=index,
                        body=query,
                        search_type='count',
                        ignore_unavailable=True
                    )
                else:
                    res = self.thread_data.current_es.deprecated_search(
                        index=index,
                        doc_type=rule['doc_type'],
                        body=query,
                        search_type='count',
                        ignore_unavailable=True
                    )
            else:
                res = self.thread_data.current_es.deprecated_search(index=index, doc_type=rule['doc_type'],
                                                                    body=query, size=0, ignore_unavailable=True)

before

            if not rule['five']:
                res = self.thread_data.current_es.deprecated_search(
                    index=index,
                    doc_type=rule.get('doc_type'),
                    body=query,
                    search_type='count',
                    ignore_unavailable=True
                )
            else:
                res = self.thread_data.current_es.deprecated_search(index=index, doc_type=rule.get('doc_type'),
                                                                    body=query, size=0, ignore_unavailable=True)

after

            if not rule['five']:
                if self.thread_data.current_es.is_atleasteight():
                    res = self.thread_data.current_es.deprecated_search(
                        index=index,
                        body=query,
                        search_type='count',
                        ignore_unavailable=True
                    )
                else:
                    res = self.thread_data.current_es.deprecated_search(
                        index=index,
                        doc_type=rule.get('doc_type'),
                        body=query,
                        search_type='count',
                        ignore_unavailable=True
                    )
            else:
                res = self.thread_data.current_es.deprecated_search(index=index, doc_type=rule.get('doc_type'),
                                                                    body=query, size=0, ignore_unavailable=True)

elastalert/test_rule.py

before

res = es_client.count(index=index, doc_type=doc_type, body=count_query, ignore_unavailable=True)

after

if es_client.is_atleasteight():
    res = es_client.count(index=index, body=count_query, ignore_unavailable=True)
else:
    res = es_client.count(index=index, doc_type=doc_type, body=count_query, ignore_unavailable=True)

elastalert/tests/conftest.py

mock_es_client

add

self.is_atleasteight = mock.Mock(return_value=False)

mock_es_sixsix_client

add

self.is_atleasteight = mock.Mock(return_value=False)

elastalert/es_mappings/8/elastalert.json

add file

dateOptionalTime→date_optional_time

{
  "numeric_detection": true,
  "date_detection": false,
  "dynamic_templates": [
    {
      "strings_as_keyword": {
        "mapping": {
          "ignore_above": 1024,
          "type": "keyword"
        },
        "match_mapping_type": "string"
      }
    }
  ],
  "properties": {
    "rule_name": {
      "type": "keyword"
    },
    "@timestamp": {
      "type": "date",
      "format": "date_optional_time"
    },
    "alert_time": {
      "type": "date",
      "format": "date_optional_time"
    },
    "match_time": {
      "type": "date",
      "format": "date_optional_time"
    },
    "match_body": {
      "enabled": "false",
      "type": "object"
    },
    "aggregate_id": {
      "type": "keyword"
    }
  }
}

add Directory

elastalert/es_mappings/8

elastalert/es_mappings/8/elastalert_error.json ・・・add file

dateOptionalTime→date_optional_time

{
  "properties": {
    "data": {
      "type": "object",
      "enabled": "false"
    },
    "@timestamp": {
      "type": "date",
      "format": "date_optional_time"
    }
  }
}

elastalert/es_mappings/8/elastalert_status.json・・・add file

dateOptionalTime→date_optional_time

{
  "properties": {
    "rule_name": {
      "type": "keyword"
    },
    "@timestamp": {
      "type": "date",
      "format": "date_optional_time"
    }
  }
}

elastalert/es_mappings/8/past_elastalert.json・・・add file

dateOptionalTime→date_optional_time

{
  "properties": {
    "rule_name": {
      "type": "keyword"
    },
    "match_body": {
      "type": "object",
      "enabled": "false"
    },
    "@timestamp": {
      "type": "date",
      "format": "date_optional_time"
    },
    "aggregate_id": {
      "type": "keyword"
    }
  }
}

elastalert/es_mappings/8/silence.json・・・add file

dateOptionalTime→date_optional_time

{
  "properties": {
    "rule_name": {
      "type": "keyword"
    },
    "until": {
      "type": "date",
      "format": "date_optional_time"
    },
    "@timestamp": {
      "type": "date",
      "format": "date_optional_time"
    }
  }
}

elastalert/create_index.py

before

es_index_mappings = read_es_index_mappings() if is_atleastsix(esversion) else read_es_index_mappings(5)

after

es_index_mappings = read_es_index_mappings() is_atleasteight(esversion) elif is_atleastsix(esversion) else read_es_index_mappings(5)

tests/create_index_test.py

add test_read_es_8_index_mappings

def test_read_es_8_index_mappings():
    mappings = elastalert.create_index.read_es_index_mappings(8)
    assert len(mappings) == len(es_mappings)
    print((json.dumps(mappings, indent=2)))
nsano-rururu commented 3 years ago

If elasticsearch 8 is released and supported, the docker image will likely need to provide both pip install elasticsearch==7.0.0 for elasticsearch 7 and pip install elasticsearch==8.0.0 for elasticsearch 8.

nsano-rururu commented 3 years ago

This is the exact code block that needs to be changed: loaders.py https://github.com/Yelp/elastalert/issues/2424#issuecomment-525670938

        # Check that doc_type is provided if use_count/terms_query
        if rule.get('use_count_query') or rule.get('use_terms_query'):
            if 'doc_type' not in rule:
                raise EAException('doc_type must be specified.')
ferozsalam commented 3 years ago

I noticed the alpha ES 8 image is now available from in the Elasticsearch Docker registry. I'll start working on ES 8 compatibility this weekend, assuming no one else is already looking at this.

jertel commented 3 years ago

Thanks @ferozsalam.

nsano-rururu commented 3 years ago

@ferozsalam

Since elasticsearch-py 8.0.0 of elasticsearch's pytnon client has not been released yet, is it correct to change the code once in the state of elasticsearch-py 7.0.0 and check the operation? https://github.com/elastic/elasticsearch-py

ferozsalam commented 3 years ago

That's a good point.

I think my plan will be to see if I can get things working with the ES8 alpha and elasticsearch-py 7.0.0 over the weekend. Elastic has already started work on elasticsearch-py 8.0.0, so if 7.0.0 doesn't work I might try using the latest version of the library direct from the repository.

If it all doesn't work and we need to wait for elasticsearch-py 8.0.0, I'll post here and pause for a while.

nsano-rururu commented 3 years ago

Since the latest elasticsearch-py currently has an implementation that does not connect to Amazon Elasticsearch Service, it is necessary to support opensearch-py at the same time as supporting elasticsearch-py 8. https://aws.amazon.com/jp/blogs/opensource/keeping-clients-of-opensearch-and-elasticsearch-compatible-with-open-source/ https://github.com/opensearch-project/opensearch-py

opensearch-py seems to be able to connect to both elasticsearch and Amazon Elasticsearch Service, but the client version before the connection restrictions were built in was elasticsearch-py 7.13.4. Please note that elastalert2 uses elasticsearch-py7.0.0, so if you change all connections to opensearch-py, some rules will not work. It's possible to determine the Amazon Elasticsearch Service, but you need to consider what to do if you use opensearch alone.

https://opensearch.org/docs/clients/index/ OpenSearch client compatibility

Python Elasticsearch client 7.13.4

nsano-rururu commented 3 years ago

@ferozsalam

The index kibana-int described in the code of elastalert/rule_from_kibana.py and elastalert/elastalert.py does not exist in Elasticsearch 7 (also Elasticsearch 6?), So there are some errors in the results I analyzed in the past. https://github.com/jertel/elastalert2/issues/92#issuecomment-827296437 Please refer to the content of the discussion commented by jertel. https://github.com/jertel/elastalert2/discussions/442#discussioncomment-1276895

nsano-rururu commented 3 years ago

@ferozsalam

Share information about es 8.

Deprecation warnings in 7.15.0 pre-releases https://github.com/elastic/elasticsearch-py/issues/1698

The body parameter for APIs are deprecated

I didn't expect the API body parameter to be deprecated. I think it has a big impact.

ferozsalam commented 2 years ago

I've spent some time this weekend (much delayed!) getting a development environment setup, and have managed to get ElastAlert running and communicating to Elasticsearch 8.0.0-alpha2 using elasticsearch-py 7.0.0 with a single debug rule running. The good news is that it looks like ES 8-alpha2 still works (at least for ElastAlert) with elasticsearch-py 7.0.0.

There are several places where I need to make further tweaks to handle the removal of doc types, but I don't foresee that proving a major hurdle.

My goal for this week is to get the unit test suite working to see what else needs fixing/changing. I will probably start doing the development work on a separate branch.

Thanks very much for your code samples above @nsano-rururu - they saved me a lot of time.

A question - does anyone know why we have pinned the current elasticsearch-py version at 7.0.0? We're probably going to run into whatever issue is causing that again with the conversion to ES 8, so I wondered if it might be good to also fix that if possible.

nsano-rururu commented 2 years ago

https://github.com/Yelp/elastalert/pull/2593#issue-533589433

Also note: I also added a pin for elasticsearch==7.0.0, because apparently 7.1.0 will NOT work with ES < 6.6 due to it not supported _source_include(s?). 7.0.0 does. Tests won't pass otherwise.

Fix issue caused by 7.x breaking change (_source_include/_source_exclude) https://github.com/elastic/elasticsearch-py/pull/1019

https://www.elastic.co/guide/en/elasticsearch/reference/7.0/breaking-changes-7.0.html#source-include-exclude-params-removed

Source filtering url parameters _source_include and _source_exclude have been removed The deprecated in 6.x url parameters are now removed. Use _source_includes and _source_excludes instead.

ES Version revert to 7.0.0 https://github.com/jertel/elastalert2/pull/90

nsano-rururu commented 2 years ago

The advantage of fixing with elasticsearch-py 7.0.0 is that ES 6/7/8 and OpenSearch only need to provide one docker image. The disadvantage is that new features cannot be used. For example, it is not possible to support Elastic Cloud's Cloud ID, or it is not possible to use something with bug fixes or performance improvements.

elasticsearch-py [7.x] » Release notes

https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/release-notes.html

nsano-rururu commented 2 years ago

After that, if there is a new query writing method supported in the version after elasticsearch-py 7.0.0 release, it will not be usable. And since using elasticsearch-py 7.x with elasticsearch 8 is an unofficial method, the chances of something going wrong are not zero.

Since it is officially announced that it will be used as follows elasticsearch-py 5.x is elasticsearch 5.x elasticsearch-py 6.x is elasticsearch 6.x elasticsearch-py 7.x is elasticsearch 7.x elasticsearch-py 8.x is elasticsearch 8.x

nsano-rururu commented 2 years ago

Continuing to use middleware with older versions has a non-zero potential for security issues. If you understand that and still use the older version, I won't say anything anymore.

ferozsalam commented 2 years ago

Yes, completely understand and agree that the aim should be to move over to the latest library version as quick as is possible.

However if migrating to ES 8 does not have a hard dependency on elasticsearch-py 8.0.0, then I think it might make more sense to work on the two tasks separately, especially if there are significant other changes required to support elasticsearch-py 8.0.0. @jertel do you have an opinion here?

Thanks for the explanation on the Docker image - it might be an idea to offer multiple Docker images so that we can move forward with elasticsearch-py version, perhaps based on different branches of the repo? Again something that I think @jertel would have to set up, so would be interested in knowing his thoughts.

jertel commented 2 years ago

Ideally we would continue with:

My preference is to have a single branch to avoid maintaining multiple copies of the source code. I think having a temporary branch to work through the ES 8 compatibility is fine though, provided the development and testing doesn't take months to complete before we can merge it back to master.

Below is my understanding of the library compatibility matrix (If there are corrections please tell me so I can update this):

elasticsearch-py8

elasticsearch-py7 (specifically 7.0.0)

opensearch-py

Abstracting the calls to the Elastic API away from the general ElastAlert 2 source code and into a new search.py class would give us the ability to put all the logic in that new class for choosing whether to use the opensearch-py library or the elasticsearch-py library. This might be easier said than done, but it would help isolate all of this complexity into one place.

If @ferozsalam can prove that ES8 compatibility can be had with the removal of doc_type and without switching to the new library then let's proceed with getting ES8 support into master without changing the Python library and without breaking ES7 or OpenSearch compatibility.

Then, separately, we can discuss deprecating support for old versions of Elasticsearch, based on Elastic's End-of-Life (EOL) dates. This will allow us to begin upgrading the elasticsearch-py library to newer versions.

nsano-rururu commented 2 years ago

Elastic Cloud, but according to the following material, it seems that you need to add the parameter cloud_id when connecting. https://www.elastic.co/guide/en/elasticsearch/client/python-api/master/connecting.html#auth-ec

from elasticsearch import Elasticsearch

es = Elasticsearch(
    cloud_id=”cluster-1:dXMa5Fx...”
)

elasticsearch-py [7.x] » Release notes https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/release-notes.html

7.0.2 (2019-05-29) Add connection parameter for Elastic Cloud cloud_id.

nsano-rururu commented 2 years ago

It's opensearch-py, but it seems that I'm trying to remove the code related to Elastic Cloud connection. I have a pull request. cloud_id and api_key. The api_key should have been supported by elastalert2. https://github.com/opensearch-project/opensearch-py

jertel commented 2 years ago

Thanks, I've updated the post above to mention that ES Cloud is "NOT supported" by opensearch-py.

nsano-rururu commented 2 years ago

Since opensearch-py should fork elasticsearch-py 7.13.4 as a new project, I think ES7 and ES8 will be as follows.

ES 7: Unknown → Supported ES 8: Unknown → Mostly Supported (unknown at this time what is missing)

jertel commented 2 years ago

Ok, then it probably also means opensearch-py does not support ES6 or lower, based on #90. I'll update that now.

sethmlarson commented 2 years ago

Hello folks :wave: Saw this thread and was wondering if there's anything I can do to help or clarify. One thing I saw that I wasn't sure of was:

elasticsearch-py7 (specifically 7.0.0)

  • ES Cloud: NOT supported

Elastic Cloud is definitely supported by all 7.x versions of the client, the cloud_id is only a convenient way of specifying the Elasticsearch cluster you're connecting to. Specifying via a URL works just as well for Elastic Cloud.

jertel commented 2 years ago

Thanks for chiming in @sethmlarson. I've updated the post above to reflect that clarification.

While you're here, could you comment on the following:

  1. Do you know if elasticsearch-py8 works with ES 7 clusters?
  2. Is there anything significant you can think of that would be a problem with using elasticsearch-py7 against an ES 8 cluster? I'm thinking primarily along the lines of search filters.
sethmlarson commented 2 years ago

I've updated the post above to reflect that clarification.

Thank you!

1. Do you know if elasticsearch-py8 works with ES 7 clusters?

Our compatibility policy is forwards compatibility, so there's no guarantee that v8.0 clients will work with v7.x servers. However if you're not relying on removed features (mapping types, filters) then it'd maybe work? Wanted to highlight the difference between "supported" and "happens to work".

2. Is there anything significant you can think of that would be a problem with using elasticsearch-py7 against an ES 8 cluster? I'm thinking primarily along the lines of search filters.

Mapping types and anything removed in 8.0 are still removed even when using "compatibility mode". However in client versions pre-7.16 will likely need to be more hands-on with the compatibility mode by settings HTTP headers yourself. In 7.16 I'm working towards getting the mode to be much easier to use.

nsano-rururu commented 2 years ago

@jertel

One point supplement. The HTTP header cannot be set without modifying the current elastalert2 program. The following pull request for yelp/elastalert seems to be the corresponding code.

allow custom http_headers in config.yaml https://github.com/Yelp/elastalert/pull/2952

ferozsalam commented 2 years ago

Thanks all for your feedback and suggestions here, there’s a lot to think about.

I think we have two options.

Option 1

We’re currently unable to upgrade our elasticsearch-py beyond 7.0.0 because we’re maintaining support for ES 6, which has been EOL for around a year.

If we were to formally drop support for ES 6 and below, we could then move to elasticsearch-py 7.15.0 (and eventually 7.16.0), which will give us some nice fixes while also making compatibility with ES 8 neater, judging by @sethmlarson's comment above.

With support for ES 8 done, we could then work on the changes necessary to support elasticsearch-py 8.0.0 without any (significant) time pressure.

Option 2

Otherwise we hardcode the compatibility header into our HTTP requests and continue with ES 7.0.0 until a point where we are happier to drop support for ES 6.


My preference is for Option 1, as:

What does everyone else think?

nsano-rururu commented 2 years ago

I think option1 is fine.

Since elasticsearch-py 8.0.0 will end support for Python 3.5 and earlier, is it possible to check the python version with setup.py? I think you should add settings if you can easily do it.

jertel commented 2 years ago

I agree with Option 1.

nsano-rururu commented 2 years ago

@ferozsalam

Regarding the abolition of the body parameter, it seems that it has changed to elasticsearch-py 9.0.0. elasticsearch-py 8.0.0 seems to support both old and new writing https://github.com/elastic/elasticsearch-py/issues/1698#issuecomment-930258093

ferozsalam commented 2 years ago

Looks like Elasticsearch 8.0 is already out, and somewhat predictably, I haven't found the time to enable support! 😄

I'll take a look later this weekend, although with the compatibility mode I suspect the changes will be minimal.

LaZyDK commented 2 years ago

While trying to update our Elastic Cloud 7.17.0 to 8.0 the cluster upgrade failed. Elastic support gave me this message:


..there are mappings defined on the indices it creates that are not compatible with 8.0.  
The date fields are being mapped as "dateOptionalTime", but these now need to be defined as "date_optional_time" instead.

To fix this, you would need to do the following:
1. Reindex each elastalert index into a temporary index:  https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html
2. Delete the original elastalert indices
3. Re-create the elastalert indices, defining the date mappings as "date_optional_time"
4. Reindex the data from your temp indices back into the fixed elastalert indices```
nsano-rururu commented 2 years ago

@LaZyDK

It's a natural result because it doesn't support elasticsearch 8 yet.

nsano-rururu commented 2 years ago

@ferozsalam

Does this Elasticsearch 8 support mean the following support and operation check?

Create Index

elastalert-test-rule

Do you support not to specify doc_type in elasticsearch 8?

Rule Type

  Don't check doc_type in use_count_query and use_terms_query in Elasticsearch 8.   document update.

  Operation check when doc_type is not specified in Metric Aggregation and Percentage Match in Elasticsearch 8   document update.

Loading Filters Directly From Kibana 3

Currently not moving normally, so it does not correspond.

others

elastalert.py・・・elastalert.py may have other modifications

get_hits

The following seems to need to be prevented from running on es8

        # Record doc_type for use in get_top_counts
        if 'doc_type' not in rule and len(hits):
            rule['doc_type'] = hits[0]['_type']
nsano-rururu commented 2 years ago

@ferozsalam

Please let us know if you need an investigation.

ferozsalam commented 2 years ago

While trying to update our Elastic Cloud 7.17.0 to 8.0 the cluster upgrade failed. Elastic support gave me this message:

..there are mappings defined on the indices it creates that are not compatible with 8.0.  
The date fields are being mapped as "dateOptionalTime", but these now need to be defined as "date_optional_time" instead.

To fix this, you would need to do the following:
1. Reindex each elastalert index into a temporary index:  https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html
2. Delete the original elastalert indices
3. Re-create the elastalert indices, defining the date mappings as "date_optional_time"
4. Reindex the data from your temp indices back into the fixed elastalert indices```

Referring to this issue, is there any (relatively simple) automated workaround to this? Or should we be creating an instruction page for people looking to migrate from ES 7 -> ES 8 with instructions on the manual steps they need to take?

nsano-rururu commented 2 years ago

Referring to this issue, is there any (relatively simple) automated workaround to this? Or should we be creating an instruction page for people looking to migrate from ES 7 -> ES 8 with instructions on the manual steps they need to take?

Do you mean to add to the FAQ how to check the index of elastalert, how to delete it, and have it run createindex again?

ferozsalam commented 2 years ago

Do you mean to add to the FAQ how to check the index of elastalert, how to delete it, and have it run createindex again?

Yes, exactly. I am not sure if the process is easily automatable, but given that people only have to do it once per cluster, perhaps manual instructions will be enough?

nsano-rururu commented 2 years ago

Yes, exactly. I am not sure if the process is easily automatable, but given that people only have to do it once per cluster, perhaps manual instructions will be enough?

I agree to add it to the FAQ.

jertel commented 2 years ago

It would make life easier for users if it was automated. If it's automated then the existing indices should be renamed with a .old suffix, instead of deleting them outright. If it is automated then the documentation would need to explain that the ES_USER must have permissions to delete elastalert-* indices, and the code would need to be able to gracefully fail in that scenario where it doesn't have access. Eg., provide a helpful message explaining why it cannot auto upgrade, and refer them to the manual upgrade steps.

Either way, the manual upgrade steps should be documented. Perhaps something like the following:

To upgrade an existing ElastAlert 2 installation to Elasticsearch 8 the following manual steps are required:

  1. Shutdown ElastAlert 2.
  2. Delete or rename the old elastalert* indices. See Elasticsearch documentation for instructions on how to delete via the API.
  3. If NOT running ElastAlert 2 via Docker or Kubernetes, run elastalert-create-index to create the new indices. This is not needed when running via a container since the container always attempts to creates the indices at startup, if they're not yet created.
  4. Restart ElastAlert 2.
konstantin-921 commented 2 years ago

It would make life easier for users if it was automated. If it's automated then the existing indices should be renamed with a .old suffix, instead of deleting them outright. If it is automated then the documentation would need to explain that the ES_USER must have permissions to delete elastalert-* indices, and the code would need to be able to gracefully fail in that scenario where it doesn't have access. Eg., provide a helpful message explaining why it cannot auto upgrade, and refer them to the manual upgrade steps.

Either way, the manual upgrade steps should be documented. Perhaps something like the following:

To upgrade an existing ElastAlert 2 installation to Elasticsearch 8 the following manual steps are required:

  1. Shutdown ElastAlert 2.
  2. Delete or rename the old elastalert* indices. See Elasticsearch documentation for instructions on how to delete via the API.
  3. If NOT running ElastAlert 2 via Docker or Kubernetes, run elastalert-create-index to create the new indices. This is not needed when running via a container since the container always attempts to creates the indices at startup, if they're not yet created.
  4. Restart ElastAlert 2.

Hi I recently upgraded from Elasticsearch 7.17.0 to 8.0.0. I am using Elastalert helm chart (v2.3.0). After the upgrade, Elastalert was broken. I went through these steps - https://elastalert2.readthedocs.io/en/latest/recipes/faq.html?highlight=elasticsearch%208#does-elastalert-2-support-elasticsearch-8, but I keep getting the following errors:

Reading Elastic 6 index mappings:
--
Mon, Feb 28 2022 5:56:11 pm | Reading index mapping 'es_mappings/6/silence.json'
Mon, Feb 28 2022 5:56:11 pm | Reading index mapping 'es_mappings/6/elastalert_status.json'
Mon, Feb 28 2022 5:56:11 pm | Reading index mapping 'es_mappings/6/elastalert.json'
Mon, Feb 28 2022 5:56:11 pm | Reading index mapping 'es_mappings/6/past_elastalert.json'
Mon, Feb 28 2022 5:56:11 pm | Reading index mapping 'es_mappings/6/elastalert_error.json'
Mon, Feb 28 2022 5:56:11 pm | Index elastalert already exists. Skipping index creation.
Mon, Feb 28 2022 5:56:14 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert_status/_search?_source_includes=endtime%2Crule_name&size=1 [status:400 request:0.006s]
Mon, Feb 28 2022 5:56:14 pm | ERROR:elastalert:Error querying for last run: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [@timestamp] in order to sort on')
Mon, Feb 28 2022 5:57:13 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert/_search?size=1000 [status:400 request:0.009s]
Mon, Feb 28 2022 5:57:13 pm | ERROR:elastalert:Error finding recent pending alerts: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on') {'query': {'bool': {'must': {'query_string': {'query': '!_exists_:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2022-02-26T14:57:13.392783Z', 'to': '2022-02-28T14:57:13.392873Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
Mon, Feb 28 2022 5:57:13 pm | Traceback (most recent call last):
Mon, Feb 28 2022 5:57:13 pm | File "/usr/local/lib/python3.10/site-packages/elastalert/elastalert.py", line 1740, in find_recent_pending_alerts
Mon, Feb 28 2022 5:57:13 pm | res = self.writeback_es.search(index=self.writeback_index, body=query, size=1000)
Mon, Feb 28 2022 5:57:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
Mon, Feb 28 2022 5:57:13 pm | return func(*args, params=params, **kwargs)
Mon, Feb 28 2022 5:57:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/__init__.py", line 810, in search
Mon, Feb 28 2022 5:57:13 pm | return self.transport.perform_request(
Mon, Feb 28 2022 5:57:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py", line 318, in perform_request
Mon, Feb 28 2022 5:57:13 pm | status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
Mon, Feb 28 2022 5:57:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 91, in perform_request
Mon, Feb 28 2022 5:57:13 pm | self._raise_error(response.status_code, raw_data)
Mon, Feb 28 2022 5:57:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 131, in _raise_error
Mon, Feb 28 2022 5:57:13 pm | raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Mon, Feb 28 2022 5:57:13 pm | elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on')
Mon, Feb 28 2022 5:58:13 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert/_search?size=1000 [status:400 request:0.008s]
Mon, Feb 28 2022 5:58:13 pm | ERROR:elastalert:Error finding recent pending alerts: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on') {'query': {'bool': {'must': {'query_string': {'query': '!_exists_:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2022-02-26T14:58:13.392331Z', 'to': '2022-02-28T14:58:13.392427Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
Mon, Feb 28 2022 5:58:13 pm | Traceback (most recent call last):
Mon, Feb 28 2022 5:58:13 pm | File "/usr/local/lib/python3.10/site-packages/elastalert/elastalert.py", line 1740, in find_recent_pending_alerts
Mon, Feb 28 2022 5:58:13 pm | res = self.writeback_es.search(index=self.writeback_index, body=query, size=1000)
Mon, Feb 28 2022 5:58:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
Mon, Feb 28 2022 5:58:13 pm | return func(*args, params=params, **kwargs)
Mon, Feb 28 2022 5:58:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/__init__.py", line 810, in search
Mon, Feb 28 2022 5:58:13 pm | return self.transport.perform_request(
Mon, Feb 28 2022 5:58:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py", line 318, in perform_request
Mon, Feb 28 2022 5:58:13 pm | status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
Mon, Feb 28 2022 5:58:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 91, in perform_request
Mon, Feb 28 2022 5:58:13 pm | self._raise_error(response.status_code, raw_data)
Mon, Feb 28 2022 5:58:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 131, in _raise_error
Mon, Feb 28 2022 5:58:13 pm | raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Mon, Feb 28 2022 5:58:13 pm | elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on')
Mon, Feb 28 2022 5:59:13 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert/_search?size=1000 [status:400 request:0.008s]
Mon, Feb 28 2022 5:59:13 pm | ERROR:elastalert:Error finding recent pending alerts: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on') {'query': {'bool': {'must': {'query_string': {'query': '!_exists_:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2022-02-26T14:59:13.391917Z', 'to': '2022-02-28T14:59:13.391993Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
Mon, Feb 28 2022 5:59:13 pm | Traceback (most recent call last):
Mon, Feb 28 2022 5:59:13 pm | File "/usr/local/lib/python3.10/site-packages/elastalert/elastalert.py", line 1740, in find_recent_pending_alerts
Mon, Feb 28 2022 5:59:13 pm | res = self.writeback_es.search(index=self.writeback_index, body=query, size=1000)
Mon, Feb 28 2022 5:59:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
Mon, Feb 28 2022 5:59:13 pm | return func(*args, params=params, **kwargs)
Mon, Feb 28 2022 5:59:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/__init__.py", line 810, in search
Mon, Feb 28 2022 5:59:13 pm | return self.transport.perform_request(
Mon, Feb 28 2022 5:59:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py", line 318, in perform_request
Mon, Feb 28 2022 5:59:13 pm | status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
Mon, Feb 28 2022 5:59:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 91, in perform_request
Mon, Feb 28 2022 5:59:13 pm | self._raise_error(response.status_code, raw_data)
Mon, Feb 28 2022 5:59:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 131, in _raise_error
Mon, Feb 28 2022 5:59:13 pm | raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Mon, Feb 28 2022 5:59:13 pm | elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on')
Mon, Feb 28 2022 6:00:13 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert/_search?size=1000 [status:400 request:0.009s]
Mon, Feb 28 2022 6:00:13 pm | ERROR:elastalert:Error finding recent pending alerts: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on') {'query': {'bool': {'must': {'query_string': {'query': '!_exists_:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2022-02-26T15:00:13.392418Z', 'to': '2022-02-28T15:00:13.392516Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
Mon, Feb 28 2022 6:00:13 pm | Traceback (most recent call last):
Mon, Feb 28 2022 6:00:13 pm | File "/usr/local/lib/python3.10/site-packages/elastalert/elastalert.py", line 1740, in find_recent_pending_alerts
Mon, Feb 28 2022 6:00:13 pm | res = self.writeback_es.search(index=self.writeback_index, body=query, size=1000)
Mon, Feb 28 2022 6:00:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
Mon, Feb 28 2022 6:00:13 pm | return func(*args, params=params, **kwargs)
Mon, Feb 28 2022 6:00:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/__init__.py", line 810, in search
Mon, Feb 28 2022 6:00:13 pm | return self.transport.perform_request(
Mon, Feb 28 2022 6:00:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py", line 318, in perform_request
Mon, Feb 28 2022 6:00:13 pm | status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
Mon, Feb 28 2022 6:00:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 91, in perform_request
Mon, Feb 28 2022 6:00:13 pm | self._raise_error(response.status_code, raw_data)
Mon, Feb 28 2022 6:00:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 131, in _raise_error
Mon, Feb 28 2022 6:00:13 pm | raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Mon, Feb 28 2022 6:00:13 pm | elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on')
Mon, Feb 28 2022 6:01:13 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert/_search?size=1000 [status:400 request:0.010s]
Mon, Feb 28 2022 6:01:13 pm | ERROR:elastalert:Error finding recent pending alerts: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on') {'query': {'bool': {'must': {'query_string': {'query': '!_exists_:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2022-02-26T15:01:13.392276Z', 'to': '2022-02-28T15:01:13.392346Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
Mon, Feb 28 2022 6:01:13 pm | Traceback (most recent call last):
Mon, Feb 28 2022 6:01:13 pm | File "/usr/local/lib/python3.10/site-packages/elastalert/elastalert.py", line 1740, in find_recent_pending_alerts
Mon, Feb 28 2022 6:01:13 pm | res = self.writeback_es.search(index=self.writeback_index, body=query, size=1000)
Mon, Feb 28 2022 6:01:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
Mon, Feb 28 2022 6:01:13 pm | return func(*args, params=params, **kwargs)
Mon, Feb 28 2022 6:01:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/__init__.py", line 810, in search
Mon, Feb 28 2022 6:01:13 pm | return self.transport.perform_request(
Mon, Feb 28 2022 6:01:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py", line 318, in perform_request
Mon, Feb 28 2022 6:01:13 pm | status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
Mon, Feb 28 2022 6:01:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 91, in perform_request
Mon, Feb 28 2022 6:01:13 pm | self._raise_error(response.status_code, raw_data)
Mon, Feb 28 2022 6:01:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 131, in _raise_error
Mon, Feb 28 2022 6:01:13 pm | raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Mon, Feb 28 2022 6:01:13 pm | elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on')
Mon, Feb 28 2022 6:02:13 pm | WARNING:elasticsearch:GET https://elk1-fl.domain.com:9200/elastalert/_search?size=1000 [status:400 request:0.008s]
Mon, Feb 28 2022 6:02:13 pm | ERROR:elastalert:Error finding recent pending alerts: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on') {'query': {'bool': {'must': {'query_string': {'query': '!_exists_:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2022-02-26T15:02:13.392588Z', 'to': '2022-02-28T15:02:13.392690Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
Mon, Feb 28 2022 6:02:13 pm | Traceback (most recent call last):
Mon, Feb 28 2022 6:02:13 pm | File "/usr/local/lib/python3.10/site-packages/elastalert/elastalert.py", line 1740, in find_recent_pending_alerts
Mon, Feb 28 2022 6:02:13 pm | res = self.writeback_es.search(index=self.writeback_index, body=query, size=1000)
Mon, Feb 28 2022 6:02:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
Mon, Feb 28 2022 6:02:13 pm | return func(*args, params=params, **kwargs)
Mon, Feb 28 2022 6:02:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/client/__init__.py", line 810, in search
Mon, Feb 28 2022 6:02:13 pm | return self.transport.perform_request(
Mon, Feb 28 2022 6:02:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py", line 318, in perform_request
Mon, Feb 28 2022 6:02:13 pm | status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
Mon, Feb 28 2022 6:02:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 91, in perform_request
Mon, Feb 28 2022 6:02:13 pm | self._raise_error(response.status_code, raw_data)
Mon, Feb 28 2022 6:02:13 pm | File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 131, in _raise_error
Mon, Feb 28 2022 6:02:13 pm | raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Mon, Feb 28 2022 6:02:13 pm | elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'No mapping found for [alert_time] in order to sort on')

Any help?

ferozsalam commented 2 years ago

Hey @konstantin-921 I suspect that part of the reason is because a release with alpha ES 8 support hasn't been cut yet - if you're using 2.3.0, you won't have the latest changes.

Please note that ElastAlert isn't guaranteed to work with ES 8, even with the latest (unreleased) changes. This is still a work in progress.

konstantin-921 commented 2 years ago

Thank you for your response @ferozsalam . Then I will wait for new releases.

nsano-rururu commented 2 years ago

In elasticsearch 8, _type should disappear, so there is a possibility of a key error in the following places.

elastalert/test_rule.py

        doc_type = res['hits']['hits'][0]['_type']

elastalert/elastalert.py

        if 'doc_type' not in rule and len(hits):
            rule['doc_type'] = hits[0]['_type']