open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.05k stars 2.35k forks source link

Scope name parsing doesn't work. #23387

Closed bin-shi-mulesoft closed 7 months ago

bin-shi-mulesoft commented 1 year ago

Component(s)

pkg/stanza

What happened?

Description

The Scope Name Parsing doesn't add the scope name to the ScopeLog.

Steps to Reproduce

  1. Prepare the example input log file. echo "com.example.Foo - some message" > /tmp/example.log
  2. Download the OTel Collector (contrib) v0.79.0, e.g. otelcol-contrib_0.79.0_darwin_amd64
  3. Start the Collector with the provided config file as below otelcol-contrib --config otel.yaml

Actual Result

Note: The output is one line, and I format it in json here for pretty looking.

{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {},
                    "logRecords": [
                        {
                            "observedTimeUnixNano": "1686785112950071000",
                            "body": {
                                "stringValue": "com.example.Foo - some message"
                            },
                            "attributes": [
                                {
                                    "key": "scope_name_field",
                                    "value": {
                                        "stringValue": "com.example.Foo"
                                    }
                                },
                                {
                                    "key": "message",
                                    "value": {
                                        "stringValue": "some message"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Expected Result

The name of scope is set to com.example.Foo.

{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {
                        "name": "com.example.Foo"
                    },
                    "logRecords": [
                        {
                            "observedTimeUnixNano": "1686785320986954000",
                            "body": {
                                "stringValue": "com.example.Foo - some message"
                            },
                            "attributes": [
                                {
                                    "key": "scope_name_field",
                                    "value": {
                                        "stringValue": "com.example.Foo"
                                    }
                                },
                                {
                                    "key": "message",
                                    "value": {
                                        "stringValue": "some message"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Collector version

v0.79.0

Environment information

Environment

OS: macOS Version 13.4

OpenTelemetry Collector configuration

receivers:
  filelog:
    include:
      - /tmp/example.log
    include_file_name: false
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<scope_name_field>\S*)\s-\s(?P<message>.*)'
        scope_name:
          parse_from: attributes.scope_name_field

processors:

exporters:
  file:
    path: /tmp/scope-name-parsing.jsonl

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    logs:
      receivers: [filelog]
      processors: []
      exporters: [file]

Log output

2023-06-14T16:25:12.745-0700    info    service/telemetry.go:104        Setting up own telemetry...
2023-06-14T16:25:12.745-0700    info    service/telemetry.go:127        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-06-14T16:25:12.745-0700    debug   exporter@v0.79.0/exporter.go:273        Alpha component. May change in the future.      {"kind": "exporter", "data_type": "logs", "name": "file"}
2023-06-14T16:25:12.746-0700    debug   receiver@v0.79.0/receiver.go:294        Beta component. May change in the future.       {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    info    service/service.go:131  Starting otelcol-contrib...     {"Version": "0.79.0", "NumCPU": 16}
2023-06-14T16:25:12.747-0700    info    extensions/extensions.go:30     Starting extensions...
2023-06-14T16:25:12.747-0700    info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:59 Starting operator       {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:63 Started operator        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:59 Starting operator       {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "regex_parser", "operator_type": "regex_parser"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:63 Started operator        {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "regex_parser", "operator_type": "regex_parser"}
2023-06-14T16:25:12.748-0700    debug   adapter/converter.go:109        Starting log converter  {"kind": "receiver", "name": "filelog", "data_type": "logs", "worker_count": 4}
2023-06-14T16:25:12.748-0700    info    service/service.go:148  Everything is ready. Begin running and processing data.
2023-06-14T16:25:12.948-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:12.948-0700    info    fileconsumer/file.go:192        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/tmp/example.log"}
2023-06-14T16:25:13.949-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:15.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:16.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:17.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:18.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:19.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:20.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:21.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:22.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:23.949-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:25.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:26.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:27.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:28.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:29.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:30.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:31.948-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:32.948-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:34.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:35.347-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:36.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:37.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:38.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
^C2023-06-14T16:25:39.253-0700  info    otelcol/collector.go:227        Received signal from OS {"signal": "interrupt"}
2023-06-14T16:25:39.253-0700    info    service/service.go:157  Starting shutdown...
2023-06-14T16:25:39.253-0700    info    adapter/receiver.go:139 Stopping stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:74 Stopping operator       {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2023-06-14T16:25:39.253-0700    debug   fileconsumer/reader.go:161      Problem closing reader  {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/tmp/example.log", "error": "close /tmp/example.log: file already closed"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:78 Stopped operator        {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:74 Stopping operator       {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:78 Stopped operator        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:90  Receive loop stopped    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:120 Converter channel got closed    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:120 Converter channel got closed    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:115 Consumer loop stopped   {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    info    extensions/extensions.go:44     Stopping extensions...
2023-06-14T16:25:39.253-0700    info    service/service.go:171  Shutdown complete.

Additional context

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski commented 1 year ago

I agree this is a problem. From what I can tell, the scope_name parser itself is working correctly, but we are not considering the scope when translating to plogs. We just assume a single scope here.

@bin-shi-mulesoft, thanks for reporting this. Let me know if you want to take the issue and I'll assign it to you.

bin-shi-mulesoft commented 1 year ago

@djaglowski Sure. I can work on this.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

someshkoli commented 7 months ago

Facing the same issue, not sure if the root cause described above still valid or not (@djaglowski can ya confirm ?). If yes, @bin-shi-mulesoft should I go ahead and pick it up ?

djaglowski commented 7 months ago

@someshkoli, my assessment of the cause is unchanged. I appreciate the help and will reassign this to you.

The biggest challenge here will be to preserve performance scope is not present. Grouping by scope will potentially introduce some overhead but we should be able to optimize for the case where scope is unused. Performance benchmarks will be helpful here. You may need to evaluate if the existing benchmarks in the package are sufficient to demonstrate the difference.

someshkoli commented 7 months ago

@someshkoli, my assessment of the cause is unchanged. I appreciate the help and will reassign this to you.

The biggest challenge here will be to preserve performance scope is not present. Grouping by scope will potentially introduce some overhead but we should be able to optimize for the case where scope is unused. Performance benchmarks will be helpful here. You may need to evaluate if the existing benchmarks in the package are sufficient to demonstrate the difference.

@djaglowski have benchmark numbers with me, On main branch: BenchmarkEmitterToConsumer: ~3.12 On my branch: BenchmarkEmitterToConsumer: ~3.26 on my branch: BenchmarkEmitterToConsumerScopeGroupping (with 2 host variation and 2 scope variation): ~4.2s

rough implementation

resourceIdx, ok := resourceHashToIdx[resourceID]
if !ok {
  resourceHashToIdx[resourceID] = pLogs.ResourceLogs().Len()

  rl = pLogs.ResourceLogs().AppendEmpty()
  upsertToMap(e.Resource, rl.Resource().Attributes())

  scopeIdxByResource[resourceID] = make(map[string]int)
  scopeIdxByResource[resourceID][scopeName] = rl.ScopeLogs().Len()
  sl = rl.ScopeLogs().AppendEmpty()
  sl.Scope().SetName(scopeName)
} else {
  rl = pLogs.ResourceLogs().At(resourceIdx)
  scopesInThisResource, _ := scopeIdxByResource[resourceID]
  scopeInResource, ok := scopesInThisResource[scopeName]
  if !ok {
    scopesInThisResource[scopeName] = rl.ScopeLogs().Len()
    sl = rl.ScopeLogs().AppendEmpty()
    sl.Scope().SetName(scopeName)
  } else {
    sl = pLogs.ResourceLogs().At(resourceIdx).ScopeLogs().At(scopeInResource)
  }
}
djaglowski commented 7 months ago

@someshkoli, thanks that seems reasonable.

someshkoli commented 7 months ago

This groups the records by records -> scopes with resource R1, R2 and scope S1, S2 -> total max possible log combination -> 4 (L1, L2, L3, L4) grouping [R1 -> (S1 -> L1, S2 -> L2)] [R2 -> (S1 -> L3, S2 -> L4)]

mvillafuertem commented 1 month ago

Hi @djaglowski, I want to use the observed time as a timestamp, is this possible? any hint?

    - type: json_parser
      timestamp:
        parse_from: 'observedTimeUnixNano'
        layout_type: epoch
        layout: ns
otelcol-contrib[29061]: error decoding 'parse_from': unrecognized prefix