open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.04k stars 2.35k forks source link

Add ability to override name of all metrics tables #34225

Open Fiery-Fenix opened 3 months ago

Fiery-Fenix commented 3 months ago

Component(s)

exporter/clickhouse

Is your feature request related to a problem? Please describe.

Right now table names for metrics are semi-hardcoded, i.e. it's using configurable prefix from metrics_table_name parameter and hardcoded suffix for each metric type, like _gauge/_sum/_summary/etc. There is no possibility to change this suffixes outside of the code, but this might be useful for such use-cases:

Describe the solution you'd like

Add configuration parameters with metrics tables suffix override. For example:

Another option might be to deprecate current metrics_table_name configuration parameter and introduce 5 new instead:

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 3 months ago

Pinging code owners:

SpencerTorres commented 3 months ago

Hey, thanks for writing this. This is a reasonable request overall, few notes:

Using single table for all metrics types, i.e. otel_metrics, which bring better compression and usability - no need to search for required metric across 5 different tables

I believe this would negatively affect querying, depending on the primary key / aggregation, but the schemas can be replaced by the user so that's not too relevant. I'm unsure if these tables would remain compatible long term. If we change the insert SQL for one kind of metric but not the other, then it wouldn't be possible to use the same table. This can be addressed later however.

Add configuration parameters with metrics tables suffix override. . . . Another option might be to deprecate current metrics_table_name configuration parameter and introduce 5 new instead:

If this were to be added, it would be nice to see them under one field:

metrics_table_suffix:
  gauge: "_gauge"
  sum: "_counter"
  summary: "_summary"
  histogram: "_histogram"
  exponential_histogram: "_exponential_histogram"

Between the two solutions you suggested, it would be best to use the full table name and not just the suffix. It has the most flexibility.

As a temporary workaround, you could always use a materialized view to forward _sum to _counter.

Thanks for the suggestion! Always happy to improve flexibility. Interested in seeing what others think

odubajDT commented 3 months ago

Hello! I would like to take this ticket if possible

Fiery-Fenix commented 3 months ago

I believe this would negatively affect querying, depending on the primary key / aggregation, but the schemas can be replaced by the user so that's not too relevant. I'm unsure if these tables would remain compatible long term. If we change the insert SQL for one kind of metric but not the other, then it wouldn't be possible to use the same table. This can be addressed later however.

As for performance of single metrics table - on my small set of ~6Bil rows I haven't found statistically significant performance degradation comparing to current, multi-table approach. I know that ClickHouse is very well optimized for smaller amount of tables with bigger size. That's why I was trying to test my sinlge-table approach on production-like environment with bigger amount of data. That's how this issue was born ;) Related to compatibility with insert SQL queries - as long as single table will have all fields from other 5 tables - there will be 0 issues with this approach. ClickHouse inserts default value (or Null if column is Nullable) in case of absence of this column in insert SQL. I've already tested this approach with custom built OpenTelemetry Collector (just renamed all metrics tables in code) - and it works as expected

Between the two solutions you suggested, it would be best to use the full table name and not just the suffix. It has the most flexibility.

Agree, that's looks as a best solution and also aligned with other tables names overrides (like logs and traces)

Frapschen commented 3 months ago

@odubajDT assigned

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

SpencerTorres commented 1 month ago

Not stale, waiting on https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34251