Open mikelsid opened 4 weeks ago
In case a message is written partially at first (without delimiter), the offset calculation of the line is incorrect.
The issue can be reproduced by using this script:
# fake-logger.sh file1=tests/tdata/fake-log-1.log echo -n "a" > $file1 sleep 3 echo "b" >> $file1 echo "c" >> $file1 file2=tests/tdata/fake-log-2.log echo "ab" > $file2 echo "c" >> $file2
Vector config:
# vector.yaml data_dir: tests/tdata sources: log: include: - tests/tdata/fake-log*.log read_from: beginning fingerprint: strategy: device_and_inode type: file offset_key: offset sinks: console: type: console inputs: - log encoding: codec: json
0.41
❯ cargo vdev run --debug tests/vector.yaml Compiling file-source v0.1.0 (/home/msidorov/vector/lib/file-source) Compiling vector v0.41.0 (/home/msidorov/vector) Compiling vector-lib v0.1.0 (/home/msidorov/vector/lib/vector-lib) Finished `dev` profile [unoptimized + debuginfo] target(s) in 32.37s Running `target/debug/vector --config tests/vector.yaml` 2024-08-15T20:24:44.847421Z INFO vector::app: Log level is enabled. level="info" 2024-08-15T20:24:44.849546Z INFO vector::app: Loading configs. paths=["tests/vector.yaml"] 2024-08-15T20:24:44.855881Z INFO vector::topology::running: Running healthchecks. 2024-08-15T20:24:44.856180Z INFO vector::topology::builder: Healthcheck passed. 2024-08-15T20:24:44.856259Z INFO vector: Vector has started. debug="true" version="0.41.0" arch="x86_64" revision="" 2024-08-15T20:24:44.856386Z INFO source{component_kind="source" component_id=log component_type=file}: vector::sources::file: Starting file server. include=["tests/tdata/fake-log*.log"] exclude=[] 2024-08-15T20:24:44.856831Z INFO source{component_kind="source" component_id=log component_type=file}:file_server: file_source::checkpointer: Attempting to read legacy checkpoint files. 2024-08-15T20:25:01.263737Z INFO source{component_kind="source" component_id=log component_type=file}:file_server: vector::internal_events::file::source: Found new file to watch. file=tests/tdata/fake-log-1.log 2024-08-15T20:25:03.313237Z INFO source{component_kind="source" component_id=log component_type=file}:file_server: vector::internal_events::file::source: Found new file to watch. file=tests/tdata/fake-log-2.log {"file":"tests/tdata/fake-log-1.log","host":"...","message":"ab","offset":1,"source_type":"file","timestamp":"2024-08-15T20:25:03.314009749Z"} {"file":"tests/tdata/fake-log-1.log","host":"...","message":"c","offset":3,"source_type":"file","timestamp":"2024-08-15T20:25:03.314240274Z"} {"file":"tests/tdata/fake-log-2.log","host":"...","message":"ab","offset":0,"source_type":"file","timestamp":"2024-08-15T20:25:03.314342546Z"} {"file":"tests/tdata/fake-log-2.log","host":"...","message":"c","offset":3,"source_type":"file","timestamp":"2024-08-15T20:25:03.314438288Z"}
As seen here, the offset of the first event in tests/tdata/fake-log-1.log is 1, which is unexpected
tests/tdata/fake-log-1.log
1
It seems like the offset was indeed set to the offset of the newly written part of the line
A note for the community
Problem
In case a message is written partially at first (without delimiter), the offset calculation of the line is incorrect.
Configuration
The issue can be reproduced by using this script:
Vector config:
Version
0.41
Debug Output
As seen here, the offset of the first event in
tests/tdata/fake-log-1.log
is1
, which is unexpected