open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.11k stars 2.39k forks source link

[exporter/Clickhouse]Failed to create schema in a clickhouse cluster #35713

Open Wudadada opened 1 month ago

Wudadada commented 1 month ago

Component(s)

exporter/clickhouse

What happened?

Description

Failed to start the otelcol-contrib because creating database failed as the error log says.

Steps to Reproduce

my clickhouse deployed as a 3-node cluster.

Expected Result

Successfully start the otelcol-contrib

Actual Result

Failed as the log say

image

Collector version

v0.111.0

Environment information

Environment

OS: (e.g., "CentOS8")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 100000

exporters:
  clickhouse:
    endpoint: tcp://10.105.212.248:9000?dial_timeout=10s
    database: otel
    async_insert: true
    ttl: 72h
    compress: lz4
    create_schema: true
    metrics_table_name: otel_metrics
    logs_table_name: otel_logs
    traces_table_name: otel_traces
    timeout: 5s
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    username: "default"
    password: "123"
    cluster_name: cluster_2S_1R

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhouse]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhouse]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhouse]

  telemetry:
    logs:
      level: "debug"

Log output

Oct 09 17:12:35 yptjkcshj-Linux-005 otelcol-contrib[1840871]: 2024/10/09 17:12:35 collector server run finished with error: cannot start pipelines: create database: code: 81, message: Database otel does not exist
Oct 09 17:12:35 yptjkcshj-Linux-005 systemd[1]: otelcol-contrib.service: Main process exited, code=exited, status=1/FAILURE
Oct 09 17:12:35 yptjkcshj-Linux-005 systemd[1]: otelcol-contrib.service: Failed with result 'exit-code'.
Oct 09 17:12:36 yptjkcshj-Linux-005 systemd[1]: otelcol-contrib.service: Service RestartSec=100ms expired, scheduling restart.
Oct 09 17:12:36 yptjkcshj-Linux-005 systemd[1]: otelcol-contrib.service: Scheduled restart job, restart counter is at 5.
Oct 09 17:12:36 yptjkcshj-Linux-005 systemd[1]: Stopped OpenTelemetry Collector Contrib.
Oct 09 17:12:36 yptjkcshj-Linux-005 systemd[1]: otelcol-contrib.service: Start request repeated too quickly.
Oct 09 17:12:36 yptjkcshj-Linux-005 systemd[1]: otelcol-contrib.service: Failed with result 'exit-code'.
Oct 09 17:12:36 yptjkcshj-Linux-005 systemd[1]: Failed to start OpenTelemetry Collector Contrib.

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

Wudadada commented 1 month ago

after I delete this row 'database: otel' in config.xml,otel-col can start and create schema,but the schema it created is not on cluster but standalone

SpencerTorres commented 1 month ago

after I delete this row 'database: otel' in config.xml,otel-col can start and create schema,but the schema it created is not on cluster but standalone

This is likely because it will fall back to the default database, which already exists by default

SpencerTorres commented 1 month ago

You may need to check the system.query_log table for a query that looks something like:

CREATE DATABASE IF NOT EXISTS otel cluster_2S_1R

For now, you can manually create your tables using the example DDL here, and then start your exporter with the create_schema config option set to false

SpencerTorres commented 1 month ago

Let me know if there are any more logs surrounding this in the otelcol output, the error message doesn't make sense. I don't see how this "create database" would error with the database already existing

Wudadada commented 1 month ago

Let me know if there are any more logs surrounding this in the otelcol output, the error message doesn't make sense. I don't see how this "create database" would error with the database already existing

Thank you for your reply, logs were lost and i reproduced again, but it seems nothing more useful log.

789 891

SpencerTorres commented 3 weeks ago

Thanks for the extra logs. Can you try doing a few things:

Wudadada commented 2 weeks ago

Thanks for the extra logs. Can you try doing a few things:

  • Check system.query_log to see if it's trying to create the database
  • Check if the otel database already exists
  • Verify your cluster setup is properly working and syncing changes/tables
  • Manually create the otel database to see if the rest of the exporter logic runs

@SpencerTorres Hi, I tried again, but I still can’t create the database ‘abc’ when it does not exist. It seems that there was no attempt to create the database in ClickHouse, as I couldn’t find any SQL query trying to create ‘abc’ in the logs. However, if I manually create the database, the rest of the logic works correctly.

image image

my current config

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['0.0.0.0:8888']

processors:
  batch:
    timeout: 5s
    send_batch_size: 100000
  transform:
    log_statements:
      - context: log
        statements:
          - set(severity_text, "TRACE") where severity_number == 1
          - set(severity_text, "DEBUG") where severity_number == 5
          - set(severity_text, "INFO") where severity_number == 9
          - set(severity_text, "WARN") where severity_number == 13
          - set(severity_text, "ERROR") where severity_number == 17
          - set(severity_text, "FATAL") where severity_number == 21

exporters:
  clickhouse:
    endpoint: tcp://10.105.212.248:9000?dial_timeout=10s
    create_schema: true
    database: abc
    async_insert: true
    ttl: 0
    compress: lz4
    timeout: 5s
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    username: "default"
    password: "123"
    cluster_name: cluster_3S_1R
    table_engine:
      name: "ReplicatedMergeTree"

    logs_table_name: otel_logs

    traces_table_name: otel_traces

    metrics_tables:
      gauge: 
        name: "otel_metrics_gauge"
      sum: 
        name: "otel_metrics_sum"
      summary: 
        name: "otel_metrics_summary"
      histogram: 
        name: "otel_metrics_histogram"
      exponential_histogram: 
        name: "otel_metrics_exp_histogram"
  debug:
    verbosity: detailed

extensions:
  opamp:
    server:
      ws:
        endpoint: wss://127.0.0.1:4320/v1/opamp
        tls: 
          insecure_skip_verify: true
    instance_uid: 01BX5ZZKBKACTAV9WEVGEMMVRZ

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhouse]
    logs:
      receivers: [otlp]
      processors: [batch, transform]
      exporters: [clickhouse]
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch]
      exporters: [clickhouse]
  extensions: [opamp]
  telemetry:
    logs:
      level: "debug"