open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.3k stars 1.42k forks source link

Scraper feedback, and what to do next? #11238

Open bogdandrutu opened 3 hours ago

bogdandrutu commented 3 hours ago

Current state:

While https://github.com/open-telemetry/opentelemetry-collector/issues/9473 was a right call in terms of strangeness that from a string we construct an ID, we should not pass a "type" because the idea was that a "receiver" can have multiple scrapers, but if that is true, all of them will be the same receiver/component type correct?

We have made a mistake probably (big part of the problem is the author of this issue) with the design of the scrapers. Here are a set of questions we need to answer, and based on that we can change the design of the scraper package:

  1. Should the "scraperhelper" be a receiver that supports multiple "scrapers" or a receiver that supports one Scraper and we can have a "multi source" scraper in case we really need to support a receiver to scrape multiple sources?
  2. Should we have a standalone "scraper" type instead of making this a receiver? Or maybe we have only one "scraperreceiver" that allows users to configure multiple scrapers?
  3. Should we have a special receiver that allows configuring scrapers?

During the decision and discussion it should be considered that currently we have a big problem, which is that for each scraper/receiver we create its one timer and we lose the opportunity of batching at the source. Another feedback we got in the past with the current a scraper as a receiver is that user has to configure the scrape interval individually for each receiver.

I like the option 3. A proposed idea is to start from how user should configure this, and see how we can achieve this making sure we "batch" as much as possible all these scrapings:

receivers:
  scraper/30sec:
    collection_interval: 30s
    initial_delay: 10s
    timeout: 10s
    sources:
      sql:
        ... // things specific to sql including enable/disable metrics
      nginx:
        ... // things specific to nginx including enable/disable metrics.

Later we can add capability to do "source discovery" or build a different receiver like the current "receivercreator" which can be adopted to follow the same pattern/structure.

dmitryax commented 3 hours ago

I like the idea. We can have just one scraper receiver eventually. I see a few opportunities with this approach:

  1. It'll remove the confusion between "push" and "pull" receivers. All the receivers will be "push" except for the scraper, and we don't need kafkametrics in addition to the kafka receiver.

  2. The batching, as you mentioned, with consistent timestamps. However, it can lead to huge payloads that potentially have to be split on the exporter side. So, batching can probably be a configurable behavior. Also, I think we should still initiate separate workers for each source scraping the data concurrently + one main worker batching them together if it's enabled.

  3. Built-in discovery would work perfectly with this new receiver. Each source can expose an optional Discoverable interface in addition to Start(...) and Scrape(...). Each source could have default built-in discovery rules for each observer with an option to override them by the user. In that case, we won't need the receiver_creator anymore.

  4. We can have additional global (applied to all sources) filtering and aggregation capabilities. Currently, each scraping receiver provides an option to disable metrics (and potentially aggregation), but it has to be configured for every metric separately, which is not always convenient. For example, I want to disable all metrics starting with system.processes. but don't want to list all of them individually. We tried to add wildcard support into the existing metrics config group. It would be much better to keep the scraper.sources.<scraper>.metrics config schema strict and move the filtering somewhere else, e.g., scraper.filter. cc @braydonk

braydonk commented 2 hours ago

Re 4:

It would be much better to keep the scraper.sources..metrics config schema strict and move the filtering somewhere else, e.g., scraper.filter

Perhaps, I think it's a cleaner config interface for sure. Provided I'm understanding this idea vs the current state of things, configuring enable/disable at the start has one big benefit: scrapers can recognize that certain metrics aren't required at scrape time, and can subsequently opt to ignore them. I have a hostmetrics PR where skipping collection for a particular metric when it's disabled yielded major performance benefits.

FWIW, the spec I wrote for that wildcard matching [1] has the potential to be applied in a way other than the current PR. I would happily welcome a better place for it to live than as generated mdatagen code like in the PR.

[1]: As of writing this is due for updates, since Dmitrii and I have already agreed on a major simplification I can make. But the spirit will remain.