vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.37k stars 1.51k forks source link

Enhancement request - Please provide an OpenSearch destination #11738

Open ryn9 opened 2 years ago

ryn9 commented 2 years ago

A note for the community

Use Cases

Elastic made it latest libraries not work with OpenSearch Please make a dedicated OpenSearch destination using the latest OpenSearch libraries.

Attempted Solutions

n/a

Proposal

Elastic made it latest libraries not work with OpenSearch Please make a dedicated OpenSearch destination using the latest OpenSearch libraries.

References

Please have a look the blog post about client libraries available for OpenSearch: https://opensearch.org/blog/community/2021/08/community-clients/

Version

n/a

protochron commented 2 years ago

One way to get this going in the meantime is to use the AWS sigv4-proxy to sign the requests to OpenSearch. It's the same workaround outlined in #6204, so it's not ideal but it does work.

zamazan4ik commented 1 year ago

@jszwedko I suppose we need some discussion here.

OpenSearch is getting more and more popular in the community for different reasons, so I think OpenSearch is kinda important for Vector.

If we agree to support OpenSearch, the main remaining question here - how it should be implemented in Vector. I suggest create dedicated source/sink for OpenSearch, even if it will share right now a lot of codebase with an existing ElasticSearch source/sink. In the future I guess OpenSearch and ElasticSearch will diverge more and more.

What do you think?

jszwedko commented 1 year ago

I'm open to creating a new opensearch sink. At present, I think it could largely just wrap the elasticsearch sink and set some defaults. If the HTTP APIs between the two products diverge more, than we could split up the implementation more, but I think just wrapping would be sufficient for now.

ryn9 commented 1 year ago

Forgive me for not knowing what's under the hood - but please know that elastic supplied libraries after 7.14 have be modified to specifically not work with OpenSearch. So if you are using these libraries - it would be best to start diverging sooner than later.

zamazan4ik commented 1 year ago

@jszwedko as @ryn9 mentioned earlier, it is not possible since Elasticsearch team explicitly broken OpenSearch support in their libraries.

jszwedko commented 1 year ago

Forgive me for not knowing what's under the hood - but please know that elastic supplied libraries after 7.14 have be modified to specifically not work with OpenSearch. So if you are using these libraries - it would be best to start diverging sooner than later.

Ah, yes, meant to mention in my other comment that we don't rely on any SDKs for Elasticsearch but just make HTTP calls directly using the hyper crate.

zamazan4ik commented 1 year ago

Hmmm, in this case I tend to agree with @jszwedko approach to create opensearch sink which just wraps with some defaults an existing elasticsearch sink. Later, if will be a need, we will be able to divide opensearch and elasticsearch sinks step by step.

ryn9 commented 1 year ago

Eventually there will be feature divergence for ingestion. When that the time comes - the OpenSearch project does maintain this library https://github.com/opensearch-project/opensearch-rs. The OpenSearch maintainers would love to hear your feedback on it and/or have you speak at one of their meetups: https://www.meetup.com/opensearch/

EDIT: And that lib has aws sigv4 built in :)

zamazan4ik commented 1 year ago

@ryn9 you are right - eventually these projects will diverge a lot. We right now could start with already implemented elasticsearch sink and just wrap it as opensearch. Later, step by step, we can rewrite it with Opensearch-specific details in mind. E.g. start to use opensearch-rs in opensearch sink instead of raw hyper-based requests.

zamazan4ik commented 1 year ago

@ryn9 by the way, did you already try to use elasticsearch sink with Opensearch installation? Did you notice any problem?

ryn9 commented 1 year ago

Apologies - I have not tried for a while - but I believe it was working when I last tested against an OpenSearch 1.x release

protochron commented 1 year ago

I use the existing elasticsearch sink with OpenSearch 1.x in AWS and it works fine, with the caveat that I handle signing separately

ryn9 commented 1 year ago

@jszwedko when the elasticsearch output code is updated is it also being tested against opensearch?

I see that in 0.26 the following change was made to vector: -The elasticsearch sink now supports an api_version option to specify the API version the targeted Elasticsearch instance exposes. This replaces and deprecates the suppress_type_name option which was previously used for controlling Elasticsearch version compatibility. -It can be set to auto to attempt to automatically determine the Elasticsearch version by querying the Elasticsearch version endpoint.

Opensearch 2.x mimics the Elasticsearch 7.x line protocol - but like Elasticsearch 8.x - will not accept _type. I am not sure what other changes are applied when using Elasticsearch 8.x line protocol (ie - if it is just _type removal), but at this point, without testing, I cannot be sure 0.26 is compatible with Opensearch 2.x

jszwedko commented 1 year ago

@jszwedko when the elasticsearch output code is updated is it also being tested against opensearch?

I see that in 0.26 the following change was made to vector: -The elasticsearch sink now supports an api_version option to specify the API version the targeted Elasticsearch instance exposes. This replaces and deprecates the suppress_type_name option which was previously used for controlling Elasticsearch version compatibility. -It can be set to auto to attempt to automatically determine the Elasticsearch version by querying the Elasticsearch version endpoint.

Opensearch 2.x mimics the Elasticsearch 7.x line protocol - but like Elasticsearch 8.x - will not accept _type. I am not sure what other changes are applied when using Elasticsearch 8.x line protocol (ie - if it is just _type removal), but at this point, without testing, I cannot be sure 0.26 is compatible with Opensearch 2.x

Setting api_version to 7 should suppress _type as well. Only api_version 6 should send it.

I see OpenSearch has a docker image, https://hub.docker.com/r/opensearchproject/opensearch, so it seemingly wouldn't be too hard to add it to our integration tests to ensure continued compatibility.

ryn9 commented 1 year ago

For anyone stumbling upon this thread ... writing back to confirm that 0.26 is suppressing _type to opensearch 2.3, and successfully pushing messages.

live with a config that looks like this..

  sink_opensearch:
    type: elasticsearch
    inputs:
      - transform_remap_for_opensearch
    compression: gzip
    healthcheck: false
    endpoints:
      - "https://<CLUSTERNAME>.<REGION>.es.amazonaws.com:443"
    auth:
      strategy: "basic"
      user: "<USERNAME>"
      password: "<PASSWORD>"
    distribution:
      retry_max_duration_secs: 300
    bulk:
      index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"
nike21oct commented 1 year ago

Hi, I have a AWS opensearch cluster on AWS which is having fine grained access control enabled which is having credential and EKS cluster which is having vector installed so implemented the below syntax in configmap of vector to get the logs on opensearch. Is my syntax is correct for this implementation as I cannot able to see any index or logs on opensearch. Need your support on this. sinks: elasticsearch: type: elasticsearch inputs: [kubernetes_logs] healthcheck: false endpoints: auth: strategy: "basic" user: "" password: ""

nike21oct commented 1 year ago

is it possible to pass the username and password in configmap as an secret , because it is not good idea to keep the credential directly into configmap

jszwedko commented 1 year ago

is it possible to pass the username and password in configmap as an secret , because it is not good idea to keep the credential directly into configmap

I think you can use normal Kubernetes secrets mechanisms unless I'm missing something.

nike21oct commented 1 year ago

I have another question i have a vector installed in EKS cluster which is sending logs to AWS opensearch but when i see index in opensearch it is showing index only with name vector, so is there any way to configure index in config file of vector and the same we can see in opensearch. Can you please help me into this?

ryn9 commented 1 year ago

I have another question i have a vector installed in EKS cluster which is sending logs to AWS opensearch but when i see index in opensearch it is showing index only with name vector, so is there any way to configure index in config file of vector and the same we can see in opensearch. Can you please help me into this?

https://vector.dev/docs/reference/configuration/sinks/elasticsearch/#bulk.index

nike21oct commented 1 year ago

Hello , i have configured vector in kubernetes cluster and it is taking kubernetes logs as a source and sinks as a elasticsearch, so just wanted to know is logs transferring to elasticsearch instantly ? As i cannot see latest logs of my applications it is showing old logs for one month old can you please help me into this?

bruceg commented 1 year ago

Technically, Vector won't be sending it instantly, but it should be close enough given the above configuration. The default batch timeout for elasticsearch is just one second, after which it would send anything that has been queued up.

If you run vector with debugging enabled, do you see requests being sent out to the elasticsearch server?

nike21oct commented 1 year ago

For anyone stumbling upon this thread ... writing back to confirm that 0.26 is suppressing _type to opensearch 2.3, and successfully pushing messages.

live with a config that looks like this..

  sink_opensearch:
    type: elasticsearch
    inputs:
      - transform_remap_for_opensearch
    compression: gzip
    healthcheck: false
    endpoints:
      - "https://<CLUSTERNAME>.<REGION>.es.amazonaws.com:443"
    auth:
      strategy: "basic"
      user: "<USERNAME>"
      password: "<PASSWORD>"
    distribution:
      retry_max_duration_secs: 300
    bulk:
      index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

can we create multiple index from this like: bulk: index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V" index2: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

Is it possible to have multiple index?

ryn9 commented 1 year ago

can we create multiple index from this like: bulk: index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V" index2: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

Is it possible to have multiple index?

You need to create multiple sinks, each with their own index definition

spencergilbert commented 1 year ago

can we create multiple index from this like: bulk: index: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V" index2: "{{ field1 }}-{{ field2 }}-{{ field3 }}-%G.%V"

Is it possible to have multiple index?

As @ryn9 pointed out, that's not a valid configuration. However I don't understand what you're trying to do with the two indices that the template fields don't already do. Any unique set of field1, field2, field3 (and the timestamp) will create a new index.

Additionally, please open a new Discussion for questions unrelated to the original issue - thanks.

spencergilbert commented 1 year ago

Adding this issue as a difference between OS and ES we need to handle: https://github.com/vectordotdev/vector/issues/17690

sandervandegeijn commented 1 month ago

I'm using the elastic sink with the v7 api definition. It does work, but we would welcome an specific opensearch sink as well :)