open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.97k stars 2.31k forks source link

New component: Coralogix Processor #33090

Open galrose opened 4 months ago

galrose commented 4 months ago

The purpose and use-cases of the new component

Coralogix processor is for clients using Coralogix. The processor will have multiple features but the first one is, to template db.statements by removing the variables and replacing them with ?. Which will add a new tag to the span called db.statement.blueprint.id and db.statement.blueprint. These will be used internally in Coralogix to be able to recognize which queries are of the same template.

At the start we expect it to work only for postgresql, mysql, and sqlserver. The processor will check the db.system and only if its in the recognized systems it will try to parse the query.

There is an option to work with sampling, and if so it will only add the sampling.priority key to db.statement.blueprints it has not seen before (using an internal cache), then it is possible to use the probabilistic sampler to only send new spans

Example configuration for the component

basic setup

processors:
  coralogix:
    db_statement_blueprints:
      with_sampling: true

with cache

processors:
  coralogix:
    db_statement_blueprints:
      with_sampling: true
      cache_config:
        max_cache_size_bytes: 1073741824 #1GB
        max_cached_entries: 10000000

Telemetry data types supported

traces

Is this a vendor-specific component?

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

No response

crobert-1 commented 4 months ago

Hello @galrose, since this is a vendor-specific component, I'll sponsor as I'm next in the list of rotating sponsors.

A couple of comments/questions.

  1. Config options are required to be snake case, so can you update dbStatementBlueprints to be db_statement_blueprints?
  2. I see the proposed telemetry type is for traces, but the mentioned SQL databases are all metric receivers, and don't support traces. I assume I'm missing something here, but what's the expectation as far as getting traces from these different databases?
  3. Can you share more information on your goal with sampling, and the cache config options? I'm not sure I entirely follow.
galrose commented 4 months ago

Hey @crobert-1 thanks for sponsoring the component 😄

  1. changed to snake_case

  2. The component will go over the client spans the services will create and template the db.statement attribute. I don't expect to get spans from the database itself

  3. Our goal with sampling is for clients to be able to send span metrics without sending their spans, but for our Database Catalog we want to be able to show the client's an example of the query itself and the template that will be created. So we need at least 1 span that contains a db.statement template that we haven't seen before to be able to display the client's db.statement example.

The cache config options are the ristero cache config options

        sp.cache, err = ristretto.NewCache(&ristretto.Config{
            NumCounters: numCounters, // number of keys to track frequency of.
            MaxCost:     cacheSize,   // maximum cost of cache.
            BufferItems: bufferItems, // number of keys per Get buffer.
        })
crobert-1 commented 4 months ago

Bit of a side note, but I'd suggest de-coupling the configuration options from the specific golang cache as much as possible, to make it simpler to change cache package if necessary. On the same note, it looks like the ristretto package hasn't had any new release for 1.5 years now, and I'm seeing some PRs that have been open for a long time without review. I have no experience with this package, but just thought I should call it out.

Otherwise, you're welcome to start submitting PRs! Please follow the guide for adding new components, in terms of contents for each PR. Looking forward to making progress on this!

galrose commented 4 months ago

That makes sense I'll update the configs in this issue accordingly. Thanks for the help :) I'll open the first PR from the guide soon

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 5 days ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.