jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.61k stars 2.45k forks source link

Official OpenSearch storage option #3044

Closed wiardvanrij closed 3 years ago

wiardvanrij commented 3 years ago

Requirement - what kind of business use case are you trying to solve?

At the moment Jaeger supports Cassandra and Elasticsearch (ES).

With recent changes in ES license, AWS created a fork (OpenSearch). For long-term commitment and architectural decisions I would like to know Jaeger's thoughts/vision/commitment towards OpenSearch as storage.

I.e.: Should we migrate from our current AWS ES 'PaaS' towards a dedicated 'ES' setup, rather than upgrading towards OpenSearch

Problem - what in Jaeger blocks you from solving the requirement?

The problem I could foresee is that we hit a compatibility issue on OpenSearch side, where Jaeger only supports ES. It would require a change in Jaeger, which might not be feasible as, at the moment, Jaeger does not support OpenSearch officially.

Now one could argue that "But the specs are the same" - this is still the case for now. Things change. For example Elasticsearch could implement new features which in return Jaeger starts using, while OpenSearch goes a different direction. At this point we lose compatibility.

Any open questions to address

I would like to point out this change: https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/1005 The sole reason is to:

This PR needs to ensure we are sending data to licensed ES.

Now this does not impact Jaeger at all, but it does make a statement that ES has no intentions to create an ecosystem where multiple projects, implementing the same API, are compatible. Therefore I strongly believe OpenSearch should be officially acknowledged as separated storage option since we cannot say "ES compatible engine". - With defining the separation, it still leaves the question open if this should get official support as Jaeger compatible storage.

pavolloffay commented 3 years ago

Great topic! :)

Let me recap the current state: Jaeger at the moment supports ES 5, 6 and 7. And therefore should be compatible with the latest OpenSearch version which if I am not mistaken is based on ES7.

To me, it makes sense to add official OpenSearch support and start testing against official images. Though, I am not sure what the situation will be in the long run. Officially supporting both adds maintenance cost. I am personally curios if there will be native go client for OpenSearch. Jaeger currently uses https://github.com/olivere/elastic and we were planning to migrate to https://github.com/elastic/go-elasticsearch.

@jkowall can probably loop in as well and comment on OpenSearch support (at least from logz.io perspective).

jkowall commented 3 years ago

We agreed on a recent roadmap item to add OpenSearch to Jaeger as a first-class storage option. As mentioned by @pavolloffay right now OpenSearch is a fork of ES 7.10 with a lot of proprietary and tracking code removed (which is a whole other issue IMO). They are fully compatible today, but as @wiardvanrij mentions Elastic is now deliberately breaking part of of the ELK stack to not work with open source versions of ElasticSearch (and OpenSearch). There was also this recent change in Beats which followed LogStash discussion here on Reddit (https://www.reddit.com/r/aws/comments/nn95aq/elastic_has_broken_filebeat_as_of_713_it_no/?utm_source=share&utm_medium=ios_app&utm_name=iossmf) .

If/when Elastic breaks the APIs for ElasticSearch we have to decide if we will continue to support those versions or remain compatible with anything at or below ElasticSearch v7.

I can't answer the question if we are forking the go client with OpenSearch, I believe we will make that determination if or maybe when Elastic changes that library to deliberately not work with open source licensed software.

jpkrohling commented 3 years ago

How about we deprecate Elasticsearch now for 1.23, claim support for OpenSearch starting as of 1.23, and rename from Elasticsearch to OpenSearch as of 1.25? We can still internally process SPAN_STORAGE_TYPE=elasticsearch as opensearch after 1.25, but if we get a bug report about anything incompatible between OpenSearch and Elasticsearch, we know which side to take.

yurishkuro commented 3 years ago

I'm not convinced that deprecating ES is the best strategy at this point. We invested a lot of effort in it, and have many production users who may be ok with the new ES licensing. We can probably get another couple of years out of that investment before ES becomes truly incompatible. Also, OpenSearch has no track record at this time to bet the farm on it.

I think it makes sense to introduce opensearch as another storage and see if we can reuse the existing ES code as much as possible.

jpkrohling commented 3 years ago

We can probably get another couple of years out of that investment before ES becomes truly incompatible

Do you mean, with the current code base, or fixing bugs as they appear?

I think it makes sense to introduce opensearch as another storage and see if we can reuse the existing ES code as much as possible.

That's another option, yes. In the future, if we get into a situation where people don't care about ES anymore, we can then deprecate/remove it.

Out of curiosity: if we end up having both OpenSearch and Elasticsearch, which current maintainers would be interested in taking care of the bug reports for Elasticsearch?

jkowall commented 3 years ago

I'm guessing Elastic will make more breaking changes in ES8, but it's hard to say. I think if we remain compatible with ES7 APIs which should be compatible with OpenSearch APIs then it will be covered. The concern as @jpkrohling points out is the maintainers time to debug issues and possibly fix code. Once again I think it's fine to make a statement as a project that we want to support both, but that will likely be difficult based on the breaking changes Elastic is making.

yurishkuro commented 3 years ago

We had quite a few contributions from community in the past year fixing things with ES, it doesn't have to be maintainers doing the work. And I think it's fine not investing directly in ES8, especially if it's deliberately going to be made incompatible.

skearns64 commented 3 years ago

It sounds like there might be some confusion here, intentional or not :) Elasticsearch isn't an "API spec," it's a product. Amazon's OpenSearch is a different product. So it makes sense that the Elasticsearch output works with Elasticsearch and uses the full breadth of functionality that Elasticsearch provides, and if there is demand for an OpenSearch output, someone contributes and maintains one that takes advantage of the full breadth of functionality that it provides.

[I work for Elastic] We've seen so many issues over the years where various forks or limited distributions of Elasticsearch don't have the expected capabilities and that confusion can cause real issues in production. And almost as bad, we've seen cases where designing for the lowest common denominator means that end-users miss out on big improvements, and have a worse experience than they should. By checking up front to make sure we're talking to an actual Elasticsearch cluster, we know what features we can expect that cluster to have, and users can rely on it to work in production.

And to some extent these compatibility issues are already happening/about-to-happen to Jaeger users - it looks like support for Elasticsearch Index Lifecycle Management (ILM) is already available in Jaeger[1], but that isn't something that is going to be compatible with Amazon's OpenSearch. So imagine the user experience when someone thought they had ILM configured to delete data after 30 days, but it didn't. .

Separate from this issue, I'll follow up with the team to see how we can help Jaeger take advantage of some key Elasticsearch features that will make a big difference to storage efficiency and performance. Things like the histogram data type for ingesting and searching pre-aggregated data, the wildcard field type to enable much more efficient wildcard searching on keyword fields without dropping long values, and the match_only_text field that significantly reduces storage space, are a few that come to mind. And features like runtime fields let you define fields on the fly, or override mappings, so you could fix issues like[2] without requiring users to reindex.

[1] https://www.jaegertracing.io/docs/1.22/deployment/#elasticsearch-ilm-support [2] https://github.com/jaegertracing/jaeger/issues/2718

pavolloffay commented 3 years ago

also cc) @jaegertracing/elasticsearch

jkowall commented 3 years ago

@skearns64 Thanks for getting involved, it's always good to have more folks in the community, we welcome folks from Elastic who want to make Jaeger better! Jaeger as you know is part of a software foundation. This means that we have certain standards which make Jaeger well adopted and aligned to other cloud native technologies. One of the key tenants of the CNCF is Apache 2.0 licensed software, we have to be very careful with Elastic's path and what we do in Jaeger, I expect more of the technology to be re-licensed based on what has been happening with the other previously open-source technologies. Breaking compatibility with open source ElasticSearch is pretty clearly the wrong message to send.

It sounds like there might be some confusion here, intentional or not :) Elasticsearch isn't an "API spec," it's a product. Amazon's OpenSearch is a different product. So it makes sense that the Elasticsearch output works with Elasticsearch and uses the full breadth of functionality that Elasticsearch provides, and if there is demand for an OpenSearch output, someone contributes and maintains one that takes advantage of the full breadth of functionality that it provides.

Actually, ElasticSearch was a project with thousands of contributors that was previously available to everyone under Apache 2.0 the same way that the underpinnings of Apache Lucene are, but now it's a product. The community is 100% driven by Elastic N.V. at this point and clearly is closing by the day.

OpenSearch is a project, there are multiple contributors and stakeholders across multiple companies. We are focused on API compatibility and not making breaking changes. [https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes.html ](This https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes.html) is not really acceptable IMO. You don't see this in other core Cloud Native Technologies. For people building on top of ElasticSearch as a database, we feel the pain regularly with the breaking changes. Do you see this happening in other databases? The only one I can think of which does this is MongoDB, and this is why the technology is not used in other open-source projects, versus MySQL which has good API compatibility.

And to some extent these compatibility issues are already happening/about-to-happen to Jaeger users - it looks like support for Elasticsearch Index Lifecycle Management (ILM) is already available in Jaeger[1], but that isn't something that is going to be compatible with Amazon's OpenSearch. So imagine the user experience when someone thought they had ILM configured to delete data after 30 days, but it didn't. .

There is an open-source version of ILM (ISM) which will work on both Open Source ElasticSearch and OpenSearch. We would likely use this technology which will also remain Apache 2.0 licensed and fully compliant with the CNCF guidelines. It probably will not work after enough breaking changes, but one can barely keep up with the number of those as per the URL above. Trying to support ElasticSearch 8 with the number of breaking changes that are likely to occur is probably not a good move for an open project.

My take on it, but once again there should be a vote depending on what happens next.

wiardvanrij commented 3 years ago

It sounds like there might be some confusion here, intentional or not :) Elasticsearch isn't an "API spec," it's a product. Amazon's OpenSearch is a different product.

I really want to focus on Jaeger as project here and figure out the steps to move forward. That is my sole intent here. However with this statement I would like to challenge you on it, if you don't mind. Your motto on the ES website says:

Elasticsearch is a distributed, free and open search and analytics engine

Here, you are telling me that it's a product. Could you elaborate why other engines like postgres, mysql, etc can be freely implemented for everyone, regardless of vendor, SaaS, PaaS, etc? Would you recon Elasticsearch intentionally brands its "engine" as product so it limits others in its usage outside of the Elasticsearch ecosystem?

Because I seriously would say that if we expect it to be "an engine" with "an API" - it would allow for a broad ecosystem and we don't have to make changes to check if !elasticsearchLicense { "cantUse" }

Anyhow, that said; Thanks for your reaction, that I do really appreciate.

stockholmux commented 3 years ago

@pavolloffay WRT the Go client, several pathways to support OpenSearch have been brought up in the past few months: 1) Contributing to olivere/elastic for ensured OpenSearch compatibility 2) A net-new Go client based off of the code in opensearch-project/opensearch-cli and in collaboration with some OpenSearch community members. (@VijayanB is point on opensearch-cli)

I don't think anything has been firmly established as of now, so if there are requirements to support the effort described here I think the OpenSearch community is still open to input.

riferrei commented 3 years ago

Because I seriously would say that if we expect it to be "an engine" with "an API" - it would allow for a broad ecosystem and we don't have to make changes to check if !elasticsearchLicense { "cantUse" }

@wiardvanrij I think what @skearns64 meant to say is that Elasticsearch is not an API in the sense of being an open spec that dictates the behavior of how applications can write software based on a set of common agreements that forms a specification — just like gRPC. It is a product that eventually might decide to add new features to make the user experience better. Sometimes, this means breaking the compatibility with whatever code has been written on top of the exposed API. The Jaeger project has dealt with this problem before, as you can see [1] here. Relying on software that doesn't have a formal public API and fixing breaking changes might happen to any software project. It shouldn't be any surprise that this might happen with Elasticsearch, too.

Any software project fully aims to not break any compatibility until the time they have to deliver something so incredibly amazing to their users that you know what — they will decide to slip this one away and work with the community to recover from the slip. Also, the usage (or just the intention) of using software that is not necessarily purely open projects but essentially products are part of what Jaeger has dealt with in the past, as you can see [2] here. It didn't stop Jaeger from being the successful project it is.

Back to the point being discussed here, I think the support for new engines — no matter who they are — will always pose a challenge in terms of compatibility because every software evolves. So deciding to remove support to Elasticsearch, adding a new storage engine for OpenSearch, or just making sure that OpenSearch remains compatible with Elasticsearch, in the long run, IMHO has the same trade-offs. There is uncertainty in all the choices.

[1] https://github.com/jaegertracing/jaeger/issues/1752 [2] https://github.com/jaegertracing/jaeger/issues/197

skearns64 commented 3 years ago

My main point was simple, but was lost in the noise a bit. OpenSearch isn't Elasticsearch. They share some history - it's a fork of an older version of Elasticsearch, which is missing core features or has similar-but-different capabilities. They have some similar API endpoints, but expecting a single output to work with both systems is a recipe for giving folks the worst of both worlds. For the Elasticsearch output, I'd like to see folks have the best possible experience. And I would imagine that folks who believe OpenSearch has a future will want OpenSearch users to have the best possible experience as well. And to be clear, we've seen well-run forks in the past - Grafana is a fork of Kibana, which charted it's own course successfully, without pretending to be Kibana or trying to confuse the community.

Thanks for making my point regarding Amazon's ISM (index state management) for Elasticsearch. It offers a limited set of functionality compared to the built-in Elasticsearch ILM, but the important part you left out is that Amazon abandoned it, and the Elasticsearch community when they forked OpenSearch. There are no versions of ISM I can find that support current versions of Elasticsearch. This sort of fragmentation, of forks and services claiming to be Elasticsearch, but not providing the basic and expected functionality, hurts users. So IMO, the right answer wouldn't be to remove ILM support from the Jaeger Elasticsearch output, it would be to add an OpenSearch output and support ISM there.

CarlMeadows commented 3 years ago

Big fan of the Jaegar project and don't necessarily want to get drawn into a tit for tat but wanted to correct one point from Steve. ISM is current and is actively maintained functionality in OpenSearch (and previously as part of Open Distro for Elasticsearch). No one has abandoned anything. The code is included in the beta release and will be included in the upcoming release candidate: https://github.com/opensearch-project/index-management.

wiardvanrij commented 3 years ago

Just to be clear; as end-user I have nothing against both products. Not going to say I love them, but I like them :). However our client is using AWS, it is using Jaeger and this raises questions on what to do. I.e. see my initial post here.

That said I'm really happy we have involvement from multiple sides in this issue. I do hope that in the end there can be a solution which, as @skearns64 says, has the best possible experience for users. I think the most awesome solution is to support both products. Which gives freedom to users. However this is something the community and maintainers of Jaeger should figure out if that would be viable I guess.

Thanks everyone.

jkowall commented 3 years ago

@wiardvanrij We (logz.io) will support OpenSearch along with compatibility to the ES7 open source code base since we need it not only for our business but to ensure that there is an Apache 2.0 compatible backend aside from Cassandra. This is critical for CNCF projects as I mentioned.

As far as what others or even Elastic themselves contribute they could provide the support for future versions that's always on the table. As a community, Jaeger has not determined if/when we will deprecate any backends, we have had these discussions in the not too distant past and would like to reduce the number the project supports as primary backends. We are also discussing changing of the schemas over time to better align with OpenTelemetry when that makes sense. Similarly aligning with other CNCF projects such as Prometheus is already a work in progress.

jkowall commented 3 years ago

I guess we also have to amend the docs here since ElasticSearch is no longer Open Source after version 7.10, or maybe just claim to support 7.10 or newer? https://www.jaegertracing.io/docs/1.22/features/#multiple-storage-backends

Up to the maintainers what the statement should be.

mswilson commented 3 years ago

or just making sure that OpenSearch remains compatible with Elasticsearch, in the long run, IMHO has the same trade-offs.

From my personal perspective (to be clear: I am speaking only for myself as a technologist and open source advocate), broad compatibility and interoperability is in the best interest of all participants in a meshed networked society. When an increasing API surface area represented by the Elasticsearch feature set is not available as open source software, any open source implementation that has a goal of retaining compatibility and interoperability is at a disadvantage. This is especially the case for an API that is not designed with capability discovery and client/server negotiated feature enabling, as has been a long standing industry best practice for Internet-scale protocols. Some of the principles of this at the protocol level have been more recently codified in IETF RFC 8170 [1], but I believe that they apply for application level APIs over network protocols as well.

When features that are added to the Elasticsearch engine are not advertised to clients, there isn't a good way to negotiate support and optionally enable them, and gracefully downgrade functionality when the support on the server side is not present. This seems to have caused some trouble in the past, for example #1474. To me this is a clear trade-off in the design and evolution of Elasticsearch as a product, and should be taken into consideration by folks who want to independently create clients and applications that interoperate with Elasticsearch products. This can be done professionally, and clinically, and does not require making broad speculations that assume bad intent, or baseless accusations to instill fear, uncertainty, or doubt.

Personally, I think we should each strive to be engineers that seek the truth, approaching problems with a clinical and scientific mind. We should present the facts without biasing them through hyperbole, and do our best to build solutions that empower everyday people to do more with technology. Navigating API breakages is just another engineering problem to solve. And it's something that I have seen collaborative open source communities do exceptionally well, because there's a focus on practical solutions that solve real problems. Rough consensus and running code. Be conservative in what you send and liberal in what you accept. Let's try to leave competitive posturing at the door?

[1] https://datatracker.ietf.org/doc/html/rfc817

jpkrohling commented 3 years ago

Being practical, I think the best course of action is what @yurishkuro suggested before: create a new OpenSearch storage mechanism for Jaeger, forking/sharing code from the Elasticsearch storage plugin that we currently have. The Elastic folks are more than welcome to send in their contributions to make the ES plugin even better, and the logz.io folks have room to contribute to features that are known to be incompatible with Elasticsearch.

In the future, if one of the plugins becomes unmaintained or a burden to the core maintainers, we can decide to deprecate/remove them.

Wearing my red hat for a moment, I'm not quite sure right now where the Red Hat team will stand in the future. At the moment, Elasticsearch is the only storage mechanism we support for "Red Hat OpenShift distributed tracing", but the whole re-licensing drama is kind of forcing our hands to get away from Elasticsearch. I'm almost certain we will not support Elasticsearch anymore, and might even remove the auto-provisioning features we have in the Jaeger Operator. We are also not quite convinced that Elasticsearch/OpenSearch is the best tracing storage available out there, and as such, we are doing some research on the topic, meaning that we can't commit to work on the OpenSearch storage plugin either. If we decide that OpenSearch is what we want to support as part of our commercial offering, we'll almost certainly add auto-provisioning capabilities to the Jaeger Operator.

pavolloffay commented 3 years ago

Let's move politics to a side. I have tried running OpenSearch with Jaeger, here is the simples setup that works:

docker run -p 9200:9200 -p 9600:9600 --name opensearch -e "discovery.type=single-node"  -e "plugins.security.disabled:true" --rm -it opensearchproject/opensearch:1.0.0
docker run --rm -it --link opensearch -p 16686:16686 -e SPAN_STORAGE_TYPE=elasticsearch jaegertracing/all-in-one:1.24.0 --es.server-urls=https://opensearch:9200 --es.version=7 --es.tls.enabled=true

All seems to be working well so far.

One small issue is that the Elasticsearch version (--es.version=7) usually does not have to be provided. It is normally derived from the root ES endpoint. Jaeger uses it to use a proper set of index mappings. The OpenSearch version is however 1.0.0, so I had to explicitly specify.

yurishkuro commented 3 years ago

@pavolloffay maybe we could clone just the ES Factory, rename the options, and take care of the inconsistencies like --es.version=7?

Eventually we might need to clone the actual backend code as well. Didn't you create a different backend implementation as part of otel work, something that was using a different driver?

pavolloffay commented 3 years ago

Didn't you create a different backend implementation as part of otel work, something that was using a different driver?

I did, we had a brand new ES storage implementation using https://github.com/elastic/go-elasticsearch, which I would not use anymore.

jkowall commented 3 years ago

There will be an open-source fork of the go client coming soon: https://aws.amazon.com/blogs/opensource/keeping-clients-of-opensearch-and-elasticsearch-compatible-with-open-source/

dioptre commented 3 years ago

Is everyone in this thread aware of elassandra? https://github.com/strapdata/elassandra just wondered how the performance compares to opensearch with the provider approach?

pavolloffay commented 3 years ago

I am closing this as the OpenSeach is already supported

lmhinnel commented 3 months ago

I have searched for documents to set up OpenSearch with Jaeger but found no proper solution. So here is a sample.

Jaeger UI: http://localhost:16686/ Opensearch Dashboard: http://localhost:5601/

version: "3"

services:
  opensearch:
    image: opensearchproject/opensearch:1
    networks:
      - opensearch-jaeger
    ports:
      - "127.0.0.1:9200:9200"
      - "127.0.0.1:9300:9300"
      - "127.0.0.1:9600:9600"
    restart: on-failure
    environment:
      - cluster.name=jaeger-cluster
      - discovery.type=single-node
      - http.host=0.0.0.0
      - transport.host=127.0.0.1
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
      - "DISABLE_INSTALL_DEMO_CONFIG=true"
      - "DISABLE_SECURITY_PLUGIN=true"
    volumes:
      - opensearch-data:/usr/share/opensearch/data

  jaeger-collector:
    image: jaegertracing/jaeger-collector
    ports:
      - "14250:14250"
      - "14269:14269"
      - "14268:14268"
      - "14267:14267"
      - "4317:4317"
      - "4318:4318"
      - "9411:9411"

    networks:
      - opensearch-jaeger
    restart: on-failure
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
    command:
      [
        "--es.server-urls=http://opensearch:9200",
        "--es.num-shards=1",
        "--es.num-replicas=0",
        "--log-level=error",
        "--es.version=7",
        "--es.tls.enabled=true",
      ]
    depends_on:
      - opensearch

  jaeger-query:
    image: jaegertracing/jaeger-query
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - no_proxy=localhost
    ports:
      - "16686:16686"
      - "16687:16687"
    networks:
      - opensearch-jaeger
    restart: on-failure
    command:
      [
        "--es.server-urls=http://opensearch:9200",
        "--span-storage.type=elasticsearch",
        "--log-level=debug",
      ]

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:1
    container_name: opensearch-dashboards
    ports:
      - 5601:5601 # Map host port 5601 to container port 5601
    expose:
      - "5601" # Expose port 5601 for web access to OpenSearch Dashboards
    environment:
      - 'OPENSEARCH_HOSTS=["http://opensearch:9200"]'
      - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true" # disables security dashboards plugin in OpenSearch Dashboards
    networks:
      - opensearch-jaeger

volumes:
  opensearch-data:
    driver: local

networks:
  opensearch-jaeger:
    driver: bridge