open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.75k stars 2.18k forks source link

Allow setting of storage policy for clickhouse exporter #32816

Open sdejong629 opened 2 months ago

sdejong629 commented 2 months ago

Component(s)

exporter/clickhouse

Is your feature request related to a problem? Please describe.

The exporter creates a new database and table, but when a specific storage policy is required for the table, this can not be set on table creation by the exporter.

Describe the solution you'd like

Add an option in de clickhouse_exporter config for allowing to set a custom storage policy on table create

Describe alternatives you've considered

It can be set afterwards in clickhouse, but this requires you to include the current storage policy and move data to the new storage before disabling the 'old' storage policy

Additional context

See the final SETTINGS line for table creation

CREATE TABLE otel_logs ON CLUSTER clickhouse_cluster
(
    `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TraceId` String CODEC(ZSTD(1)),
    `SpanId` String CODEC(ZSTD(1)),
    `TraceFlags` UInt32 CODEC(ZSTD(1)),
    `SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
    `SeverityNumber` Int32 CODEC(ZSTD(1)),
    `ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
    `Body` String CODEC(ZSTD(1)),
    `ResourceSchemaUrl` String CODEC(ZSTD(1)),
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ScopeSchemaUrl` String CODEC(ZSTD(1)),
    `ScopeName` String CODEC(ZSTD(1)),
    `ScopeVersion` String CODEC(ZSTD(1)),
    `ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)
TTL toDateTime(Timestamp) + toIntervalHour(8)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1, storage_policy = 'custom_storage_policy';
github-actions[bot] commented 2 months ago

Pinging code owners:

hanjm commented 2 months ago

Looks goods to add new config field support extra setting. hi, @SpencerTorres what is your ideal and could have time to implement it?

SpencerTorres commented 4 weeks ago

ClickHouse has a lot of options when configuring a table, too many for us to reasonably keep up with and add to the exporter config.

Instead, I recommend using the new create_schema: false option in the config and manually creating the required tables yourself. In a production environment, it's not good to rely on multiple instances of the exporter racing to auto-create the schema.

When you set create_schema to false, it will not create the tables for you. This lets you create the tables manually and have full control over the schema/settings. It also lets you know WHEN the table is created, instead of questioning whether the exporter did it.

If you need an example of what DDL the exporter would run, you can check the default_ddl/ folder. (This is a permalink, be sure to use the the version that matches your exporter.)