vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.12k stars 1.6k forks source link

file source component metadata not available when using log_namespace: true #21341

Closed cornebester closed 1 month ago

cornebester commented 1 month ago

A note for the community

Problem

When schema: log_namespace: true the file metadata is unavailable on % for remap. (Want to add the file path to the emitted event) Log files contains well structured json entries with timestamp key

Configuration

data_dir: "./"

schema:
  log_namespace: true

# Ingest
sources:
  test:
    type: "file"
    include: ["c:\\folder\\*.log"]

transforms:
  parse_logs:
    type: "remap"
    inputs: 
    - test
    source: |
      .meta_data = %

# Print parsed logs to stdout
sinks:
  print:
    type: "console"
    inputs: ["test"]
    encoding:
      codec: "json"
      json:
        pretty: true

Version

vector 0.41.1 (x86_64-pc-windows-msvc 745babd 2024-09-11 14:55:36.802851761)

Debug Output

024-09-23T09:26:03.935951Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=116
2024-09-23T09:26:03.936050Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=106
2024-09-23T09:26:03.936135Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=131
2024-09-23T09:26:03.936207Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=112
2024-09-23T09:26:03.936267Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=136
2024-09-23T09:26:03.936325Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=151
2024-09-23T09:26:03.936380Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=148
2024-09-23T09:26:03.936440Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=149
2024-09-23T09:26:03.936505Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=142
2024-09-23T09:26:03.936603Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=121
2024-09-23T09:26:03.936688Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=196
2024-09-23T09:26:03.936770Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=378
2024-09-23T09:26:03.936853Z TRACE source{component_kind="source" component_id=rt component_type=file}:file_server: file_source::file_server: Read bytes. path="c:\\redacted\\redacted\\redacted.log" bytes=143
2024-09-23T09:26:03.936999Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=116 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937114Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=118 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937227Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=106 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937298Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=108 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937368Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=131 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937435Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=133 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937501Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=112 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937591Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=114 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937676Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=136 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937762Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=138 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937853Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=151 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.937934Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=153 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938005Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=148 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938073Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=150 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938138Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=149 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938204Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=151 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938268Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=142 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938333Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=144 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938398Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=121 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938464Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=123 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938535Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=196 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938614Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=198 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938706Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=378 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938785Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=380 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938845Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Bytes received. byte_size=143 protocol="file" file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938905Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector::internal_events::file::source: Events received. count=1 byte_size=145 file=c:\redacted\redacted\redacted.log
2024-09-23T09:26:03.938983Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_buffers::topology::channel::limited_queue: Sent item.
2024-09-23T09:26:03.939040Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_common::internal_event::events_sent: Events sent. count=13 byte_size=2069 output=_default
2024-09-23T09:26:03.939124Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_core::fanout: Processing control message outside of send: ControlMessage::Add(ComponentKey { id: "parse_logs" })
2024-09-23T09:26:03.939198Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_core::fanout: Processing control message outside of send: ControlMessage::Add(ComponentKey { id: "print" })
2024-09-23T09:26:03.939302Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_buffers::topology::channel::limited_queue: Sent item.
2024-09-23T09:26:03.939364Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_buffers::topology::channel::limited_queue: Sent item.
2024-09-23T09:26:03.939393Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_received: Events received. count=13 byte_size=2069
2024-09-23T09:26:03.939439Z TRACE source{component_kind="source" component_id=rt component_type=file}: vector_core::fanout: Sent item to fanout.
"{\"timestamp\": \"2024-09-23T09:25:56.235Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"LOG level set to: INFO\"}\r"
2024-09-23T09:26:03.939533Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=118
2024-09-23T09:26:03.939652Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=136 protocol=console
2024-09-23T09:26:03.939560Z DEBUG transform{component_kind="transform" component_id=parse_logs component_type=remap}: vector::utilization: utilization=0.00007400706341165186
"{\"timestamp\": \"2024-09-23T09:25:56.236Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"App starting\"}\r"
2024-09-23T09:26:03.939714Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=108
2024-09-23T09:26:03.939853Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=126 protocol=console
2024-09-23T09:26:03.939782Z TRACE transform{component_kind="transform" component_id=parse_logs component_type=remap}: vector_common::internal_event::events_received: Events received. count=13 byte_size=2069
"{\"timestamp\": \"2024-09-23T09:25:56.236Z\", \"level\": \"INFO\", \"name\": \"root\", \"message\": \"Writing files to c:\\\\redacted\\\\redacted\\\\ \"}\r"
2024-09-23T09:26:03.939912Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=133
2024-09-23T09:26:03.940041Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=157 protocol=console
2024-09-23T09:26:03.940069Z TRACE transform{component_kind="transform" component_id=parse_logs component_type=remap}: vector_core::fanout: No senders present.
"{\"timestamp\": \"2024-09-23T09:25:56.237Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"Main pid is: 33636\"}\r"
2024-09-23T09:26:03.940122Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=114
2024-09-23T09:26:03.940259Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=132 protocol=console
2024-09-23T09:26:03.940186Z TRACE transform{component_kind="transform" component_id=parse_logs component_type=remap}: vector_common::internal_event::events_sent: Events sent. count=13 byte_size=4799 output=_default
"{\"timestamp\": \"2024-09-23T09:25:56.237Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"Thread name: MainThread | thread id: 29236\"}\r"
2024-09-23T09:26:03.940319Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=138
2024-09-23T09:26:03.940455Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=156 protocol=console
2024-09-23T09:26:03.940512Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=153
"{\"timestamp\": \"2024-09-23T09:25:56.239Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"Thread name: Thread-1 (mqtt_publisher) | thread id: 49500\"}\r"
2024-09-23T09:26:03.940562Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=171 protocol=console
2024-09-23T09:26:03.940650Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=150
"{\"timestamp\": \"2024-09-23T09:25:56.239Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"Thread name: Thread-2 (file_writer) | thread id: 40428\"}\r"
2024-09-23T09:26:03.940702Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=168 protocol=console
2024-09-23T09:26:03.940799Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=151
"{\"timestamp\": \"2024-09-23T09:25:56.240Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"Thread name: Thread-3 (th_log_serial) | thread id: 4516\"}\r"
2024-09-23T09:26:03.940851Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=169 protocol=console
2024-09-23T09:26:03.940948Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=144
"{\"timestamp\": \"2024-09-23T09:25:56.241Z\", \"level\": \"WARNING\", \"name\": \"__main__\", \"message\": \"Upload directory already exists, not creating\"}\r"
2024-09-23T09:26:03.941000Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=162 protocol=console
2024-09-23T09:26:03.941093Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=123
"{\"timestamp\": \"2024-09-23T09:25:56.243Z\", \"level\": \"INFO\", \"name\": \"__main__\", \"message\": \"Active capture file size: 0\"}\r"
2024-09-23T09:26:03.941144Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::bytes_sent: Bytes sent. byte_size=141 protocol=console
2024-09-23T09:26:03.941238Z TRACE sink{component_kind="sink" component_id=print component_type=console}: vector_common::internal_event::events_sent: Events sent. count=1 byte_size=198

Example Data

{"timestamp": "2024-09-23T09:25:56.235Z", "level": "INFO", "name": "main", "message": "LOG level set to: INFO"} {"timestamp": "2024-09-23T09:25:56.236Z", "level": "INFO", "name": "main", "message": "App starting"}

Additional Context

Windows machine

References

No response

jszwedko commented 1 month ago

@cornebester I think you are using the wrong input in your sink. It should be inputs: [parse_logs].

cornebester commented 1 month ago

@jszwedko great thanks. Sometimes one just need an extra pair of eyes! Sorry about this

jszwedko commented 1 month ago

@jszwedko great thanks. Sometimes one just need an extra pair of eyes! Sorry about this

No worries! I've frequently made the same mistaken 🙂