vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.75k stars 1.57k forks source link

feat(new sink): add CnosDB sink #18156

Open Subsegment opened 1 year ago

Subsegment commented 1 year ago

A note for the community

Use Cases

We want to add a CnosDB sink to Vector so that:

Attempted Solutions

Try to get CnosDB to support sink as vector type: there is no graceful way to pass some parameters, only some parameters that are used only once are added to the data written, which causes some waste of resources and is not very safe

Proposal

I have submitted a pr about add cnosdb sink

References

18147

Version

0.31.0

dsmith3197 commented 1 year ago

Hi @Subsegment,

I noticed that CnosDB supports InfluxDB and Prometheus Remote Write integrations. It looks like Vector's Prometheus Remote Write sink will support your use case as is. Do you think that will fit your needs?

Subsegment commented 1 year ago

Hi @dsmith3197 Thank you very much for your suggestion. In fact, using prom and using vector as a sink have the same problem. The health check cannot pass, the authority check cannot pass, and some private tokens are transmitted through data. We think this is not elegant and safe.

Subsegment commented 1 year ago

Hi @dsmith3197 Thank you very much for your suggestion. In fact, using prom and using vector as a sink have the same problem. The health check cannot pass, the authority check cannot pass, and some private tokens are transmitted through data. We think this is not elegant and safe.

In addition, I observed the code of prom sink. It seems that prom only processes metric data at present, and there is no logic related to log event.

dsmith3197 commented 1 year ago

@Subsegment Thanks for looking into the Prometheus Remote Write sink. You are correct that it only supports metrics.

If you want to support metrics and logs, then the InfluxDB sink will be more suitable. From what I can tell, https://github.com/vectordotdev/vector/pull/18147 is heavily adopted from the InfluxDB sink. Rather creating a new sink, I think we can likely extend the InfluxDB sink or extract the shared logic and use the generic http sink.

Could you elaborate on the differences between the CnosDB protocol and the InfluxDB protocol?

Subsegment commented 1 year ago

Hi @dsmith3197

We support lineprotocol from influxdb, so we assemble lineprotocol when sink sends request, so we adopt some influxdb sink in cnosdb sink, However, we found that the influxdb log sink directly converts the data to strings when processing the map and array, which is a defect of the inflxudb protocol and prevents the observability of some data. We will support direct storage of map and array types in the future, so we will modify the influxdb protocol or use the new protocol directly. I hope this issue will not prevent us from merging the cnosdb sink. There are also some of the same problems, if the inflxudb sink is used as the cnosdb sink, some privacy data processing and parameter issues are difficult to be handled.Using an http sink means that you need to convert different data protocols before sending them, which may need to be handled with transform VRL. It is easier to convert directly in the sink than relying on vector-rich sources.

Thanks for your advice

dsmith3197 commented 1 year ago

Hi @Subsegment, thanks for the reply. I have a few questions I'd like to ask so that we can fully understand your needs and come up with the best solution for us all 🙂.

We support lineprotocol from influxdb, so we assemble lineprotocol when sink sends request, so we adopt some influxdb sink in cnosdb sink, However, we found that the influxdb log sink directly converts the data to strings when processing the map and array, which is a defect of the inflxudb protocol and prevents the observability of some data. We will support direct storage of map and array types in the future, so we will modify the influxdb protocol or use the new protocol directly.

I understand that the influxdb line protocol does not adequately support maps and arrays for you. With that being said, the influxdb line protocol is designed primarily for metrics, which typically do not have nested maps or arrays, but not logs. Do you want to support sending both logs and metrics to cnosdb? If so, the influxdb line protocol or prometheus remote write protocol could be great choices for metrics, but won't be ideal for logs. Do you know what protocol you would use in the future?

There are also some of the same problems, if the inflxudb sink is used as the cnosdb sink, some privacy data processing and parameter issues are difficult to be handled.Using an http sink means that you need to convert different data protocols before sending them, which may need to be handled with transform VRL. It is easier to convert directly in the sink than relying on vector-rich sources.

Could you please give an example of the "privacy data processing and parameter" issues you mention above to help me better understand? Also, could you explain/give examples of how a CnosDB HTTP request differs from an InfluxDB HTTP request? For example, do they have different headers, authentication, schemes, etc? I would like to better understand the differences between the two to best advise on how to proceed.

Subsegment commented 1 year ago

Thank you for your reply @dsmith3197

Here are some answers to your questions. Also, if a lot of repetitive logic doesn't look elegant, I can modify the code to reuse some of the logic of the inflxudb sink.

dsmith3197 commented 1 year ago

Hi @Subsegment,

Thank you for the detailed explanation! Let me revisit the component qualification checklist and get back to you.

Subsegment commented 1 year ago

Hi @Subsegment,

Thank you for the detailed explanation! Let me revisit the component qualification checklist and get back to you.

Ok, thanks!

neuronull commented 1 year ago

👋 Hi @Subsegment ! Apologies for the delayed response here!

I am picking up where Doug left off with this new component qualification, and am in the process of getting up to speed on it.

Want to confirm since it's been a little while, are you still interested in pursuing this?

Subsegment commented 1 year ago

👋 Hi @Subsegment ! Apologies for the delayed response here!

I am picking up where Doug left off with this new component qualification, and am in the process of getting up to speed on it.

Want to confirm since it's been a little while, are you still interested in pursuing this?

I understand. Yes, we are still interested in pursuing this

neuronull commented 1 year ago

Thanks for your patience @Subsegment!

We discussed this and have a proposal for an alternative approach to the design taken in the existing PR:

What do you think?

Thanks~

Subsegment commented 1 year ago

Thank you for your advice. @neuronull

This is fine for the current situation, but since CnosDB is not completely satisfied with the features of the Line protocol, we will support storing map, array and other objects, if you have seen the recent release of CnosDB 2.4.0, you will find that we already support the Geometry type, which cannot be represented by Line Protocol, and as CnosDB's support types evolve, we will use a new protocol developed by ourselves, not Line Protocol.

My suggestion is to reuse InfluxDB's logic on LineProtocol and let me implement the replacement when our protocol is developed.

What do you think of that?

:)

neuronull commented 1 year ago

Thanks for those extra details @Subsegment ,

Discussed this some more with the team. We're still open to accepting an InfluxDB codec and going that route. Though it is understandable if you're not interested in pursuing that due to the short duration it would be of use to you. Our motivation for that direction is to make the best decision that can result in the longterm benefit to the future of the project.

But for a new cnosdb sink, based on the current situation and the emerging CnosDB protocol/client that is in the works, we feel the most sensible route would be to wait for a stable version of this client, and then re-evaluate the inclusion of a new cnosdb sink at that time.

Subsegment commented 12 months ago

Thanks @neuronull

We will discuss this, use your proposal, or develop the CnosDB Client in the near future.

Hope that we can keep in touch.

neuronull commented 12 months ago

👍 Sounds good. We'll keep this issue open, feel free to respond here again as things progress and definitely lets keep in touch. Thanks @Subsegment ~