open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.09k stars 2.38k forks source link

[receiver/sqlquery] Log signals reprocessed when tracking_column type is timestamp, and value has milliseconds precision #35194

Closed Grandys closed 1 month ago

Grandys commented 2 months ago

Component(s)

internal/sqlquery, receiver/sqlquery

What happened?

Description

Milliseconds from timestamp column is trimmed during processing, when it's used as a tracking column. As a result, when tracking_column has timestamp type and value has milliseconds precision, rows are re-processed.

Steps to Reproduce

Here is setup: https://github.com/Grandys/otel-collector-sqlqueryreceiver-setup, branch reporocess-ts-column

git clone git@github.com:Grandys/otel-collector-sqlqueryreceiver-setup.git
cd otel-collector-sqlqueryreceiver-setup/
git checkout reporocess-ts-column
# Build custom otel collector with OCB
make run_rebuild

Verify collector output or check local LGTM (http://localhost:3000 admin/admin).

For example, when tracking has value 2024-09-15 16:45:16.222464 And executed sql is:

select 'Datapoint value for ' || log_type || ' is ' || datapoint as datapoint, log_type, created_at
from logs_data
where created_at > $$1 order by created_at

2024-09-15 16:45:16 is taken as a parameter (without .222464). As a result, the same rows are reprocessed.

Expected Result

For my setup, I'd expect 4 log entries generated in total

Actual Result

Log signals are generated every 10 seconds, for the same records, ex:

Timestamp Log body
2024-09-15 19:05:13.244 Datapoint value for db is 1
2024-09-15 19:05:13.244 Datapoint value for app is 4
2024-09-15 19:05:13.244 Datapoint value for web is 2
2024-09-15 19:05:03.244 Datapoint value for web is 2
2024-09-15 19:05:03.244 Datapoint value for app is 4
2024-09-15 19:05:03.244 Datapoint value for db is 1
2024-09-15 19:04:53.244 Datapoint value for web is 2
2024-09-15 19:04:53.244 Datapoint value for db is 1
2024-09-15 19:04:53.244 Datapoint value for app is 4
2024-09-15 19:04:43.244 Datapoint value for web is 2
2024-09-15 19:04:43.244 Datapoint value for app is 4
2024-09-15 19:04:43.244 Datapoint value for db is 1
2024-09-15 19:04:33.245 Datapoint value for web is 2
2024-09-15 19:04:33.245 Datapoint value for db is 1
2024-09-15 19:04:33.245 Datapoint value for app is 4
2024-09-15 19:04:23.245 Datapoint value for web is 2
2024-09-15 19:04:23.245 Datapoint value for app is 4
2024-09-15 19:04:23.245 Datapoint value for db is 1
2024-09-15 19:04:13.245 Datapoint value for db is 1
2024-09-15 19:04:13.245 Datapoint value for app is 4
2024-09-15 19:04:13.245 Datapoint value for web is 2
2024-09-15 19:04:03.246 Datapoint value for web is 2
2024-09-15 19:04:03.246 Datapoint value for app is 4
2024-09-15 19:04:03.246 Datapoint value for db is 1

Collector version

v0.109.0

Environment information

Environment

OS: macOS Monterey Compiler: go1.22.6 darwin/amd64

OpenTelemetry Collector configuration

receivers:
  sqlquery:
    driver: postgres
    datasource: "host=localhost port=5432 user=otel password=otel database=otel sslmode=disable"
    queries:
      - sql: "select 'Datapoint value for ' || log_type || ' is ' || datapoint as datapoint, log_type, created_at from logs_data where created_at > $$1 order by created_at"
        tracking_column: created_at
        tracking_start_value: '2024-09-15 00:00:00'
        logs:
          - body_column: datapoint
            attribute_columns: [ "log_type" ]

exporters:
  debug:
    verbosity: detailed
  otlp/lgtm:
    endpoint: localhost:4317
    tls:
      insecure: true

processors:
  batch:
  resource:
    attributes:
      - key: service.name
        value: "sql-query-reader"
        action: insert

service:
  pipelines:
    metrics/sqlquery:
      receivers: [ sqlquery ]
      processors: [ resource, batch ]
      exporters: [ otlp/lgtm ]
    logs/sqlquery:
      receivers: [ sqlquery ]
      processors: [ resource, batch ]
      exporters: [ debug,otlp/lgtm ]

Log output

No response

Additional context

Table definition:

create table logs_data
(
    id         serial
        primary key,
    created_at timestamp,
    datapoint  integer,
    log_type   text
);
github-actions[bot] commented 2 months ago

Pinging code owners:

crobert-1 commented 1 month ago

This change makes sense to me, removing needs triage.