vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.96k stars 1.59k forks source link

Vector not processing real time data - Reading old files and old data for file source #18600

Closed amanbhaskar92 closed 1 year ago

amanbhaskar92 commented 1 year ago

A note for the community

Problem

Usecase: My java application is deployed on kubernetes. It has many instance as pods running. Each pod writes the logs to shared mounted volume. Under shared volume , folder gets created with pod name and under this folder , there are log files /shared/applogs/podname/files. Also log rotation is on. I am trying to read only files ending with .log using vector and send the events to kafka.

  1. Vector keeps open all the log files specified under the folder /shared/applogs/ /*.log even if they are older than the secs specified in ignore_older_secs.
  2. It evens reads from the file which has not been modififed in last 10 mins. I have specified 600 in ignore_older_secs field.
  3. Also it is reading data from the files with lot of delay. It is reading 6 hours old data and sending it to kafka with so much delay. How can I make it real time.

Configuration

# Default values for Vector
# See Vector helm documentation to learn more:
# https://vector.dev/docs/setup/installation/package-managers/helm/

# fullnameOverride -- Override the full name of resources.
fullnameOverride: "vector-glb-staging-omnihubapp-1"

role: "Stateless-Aggregator"
rollWorkload: true
image:
  # image.repository -- Override default registry and name for Vector's image.
  repository: registry.tools.3stripes.net/eft-omni-comsreautomic/vector
  # image.pullPolicy -- The [pullPolicy](https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy) for
  # Vector's image.
  pullPolicy: IfNotPresent

  tag: dev

replicas: 1

# podManagementPolicy -- Specify the [podManagementPolicy](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-management-policies)
# for the StatefulSet. Valid for the "Aggregator" role.

service:
  enabled: false

serviceHeadless:
  enabled: false

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 200m
    memory: 256Mi

customConfig:
  data_dir: /tmp/vector
  sources:
    app_logs_source:
      type: file
      include:
        - /shared/applogs/*/*.log
      exclude:
        - /shared/applogs/*/*_message_*.log
      read_from: end
      ignore_older: 600
      multiline:
        start_pattern: '(^\d{4}-\w{3}-\d{2})|(^\[\d{1,2}/\d{1,2}/\d{2})'
        mode: halt_before
        condition_pattern: '(^\d{4}-\w{3}-\d{2})|(^\[\d{1,2}/\d{1,2}/\d{2})'
        timeout_ms: 60000
      fingerprint:
        strategy: "checksum"
        ignored_header_bytes: 0
        lines: 20

  transforms:
    apps_logs_transform:
      type: remap
      inputs:
        - app_logs_source
      source: |-
        if exists(.message) {
          if contains(string!(.message), "ERROR") {
            .log = {"level": "ERROR"}
          } else if contains(string!(.message), "WARN") {
            .log = {"level": "WARN"}
          } else {
            .log = {"level": "INFO"}
          }
        }
    apps_logs_transform_filter:
      type: filter
      inputs:
        - apps_logs_transform
      condition: .log.level != "WARN"
    apps_logs_transform_remap:
      type: remap
      inputs:
        - apps_logs_transform_filter
      source: |-
        .labels = {"env":"stg"}
        .service = {"id":"7a748c9b-93f4-4150-b452-5bb6be7a64e8"}
        del(.host)
        del(.hostname)
        .host = {"hostname":"glb-omnihub-nonproduction"}
        .tags = "app_logs"
        .kubernetes.namespace_name = "global-staging-omnihubapp"
        .kubernetes.namespace_actualname = "glb-staging-omnihubapp-1"

  sinks:
    emit_syslog:
      type: kafka
      inputs:
        - apps_logs_transform_remap
      bootstrap_servers: 
      key_field: message_key
      topic: 
      tls:
        enabled: true
        ca_file: 
        crt_file: 
        key_file: 
        verify_certificate: true
      encoding:
        codec: json

    emit_syslog_console:
      inputs:
        - apps_logs_transform_remap
      target: stdout
      type: console
      encoding:
        codec: json

# defaultVolumes -- Default volumes that are mounted into pods. In most cases, these should not be changed.
# Use `extraVolumes`/`extraVolumeMounts` for additional custom volumes.
# @default -- See `values.yaml`
defaultVolumes:
  - name: var-log
    hostPath:
      path: "/var/log/"
  - name: var-lib
    hostPath:
      path: "/var/lib/"
  - name: procfs
    hostPath:
      path: "/proc"
  - name: sysfs
    hostPath:
      path: "/sys"

# defaultVolumeMounts -- Default volume mounts. Corresponds to `volumes`.
# @default -- See `values.yaml`
defaultVolumeMounts:
  - name: var-log
    mountPath: "/var/log/"
    readOnly: true
  - name: var-lib
    mountPath: "/var/lib"
    readOnly: true
  - name: procfs
    mountPath: "/host/proc"
    readOnly: true
  - name: sysfs
    mountPath: "/host/sys"
    readOnly: true

# extraVolumes -- Additional Volumes to use with Vector Pods.
extraVolumes:  
  - name: logs-volume
    persistentVolumeClaim:
      claimName: glbstg-oms-common-app    
  - name: vector-secret
    secret:
      secretName: vectorsecurestore

# extraVolumeMounts -- Additional Volume to mount into Vector Containers.
extraVolumeMounts:
  - name: logs-volume
    mountPath: "/shared/"
  - name: vector-secret
    mountPath: "/etc/vector-secret"

# initContainers -- Init Containers to be added to the Vector Pods.

Version

version="0.27.0" arch="x86_64" revision="5623d1e 2023-01-18"

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

dsmith3197 commented 1 year ago

Are you able to upgrade to the latest version of Vector and let us know if the problem still persists?

jszwedko commented 1 year ago

You could also try https://vector.dev/docs/reference/configuration/sources/file/#read_from to have Vector read from the latest rather than start from the beginning.

amanbhaskar92 commented 1 year ago

@jszwedko I am already using read _from: end. @dsmith3197 Can you please suggest the docker image with the latest version ? Can I use latest-alpine?

amanbhaskar92 commented 1 year ago

@dsmith3197 I am now using the latest-alpine image with version as version="0.32.1" arch="x86_64" revision="9965884 2023-08-21 14:52:38.330227446", is this version fine?

dsmith3197 commented 1 year ago

Yep, 0.32.1 is the latest version.

amanbhaskar92 commented 1 year ago

Thanks , I have updated the version last night and things are looking good for now. I have major production rollout in coming days so I would request you to keep this bug open for few more days so that I can put my findings here and take your valuable advice.

amanbhaskar92 commented 1 year ago

One more thing I noticed is that vector is keeping handle open for log rotated files 1 /usr/local/bin/vector 544 /shared/applogs/oms-agt-batch1-agt-glb-grp-2-bfc4b9d8f-dlfg9/oms-agt-batch1-agt-glb-grp-2-bfc4b9d8f-dlfg9.log (deleted) 1 /usr/local/bin/vector 547 /shared/applogs/oms-agt-batch1-agt-glb-exec-collection-576bb44f45-w97ld/oms-agt-batch1-agt-glb-exec-collection-576bb44f45-w97ld.log (deleted) 1 /usr/local/bin/vector 549 /shared/applogs/oms-agt-batch1-agt-glb-exec-collection-576bb44f45-w97ld/oms-agt-batch1-agt-glb-exec-collection-576bb44f45-w97ld.log (deleted)

Will it make any impact ? Will it read from deleted file?

amanbhaskar92 commented 1 year ago

I can confirm it is reading from deleted(log rotated files), Is the bug , How can I can fix this behavior. It should not read from the rotated files. I am already using checksum strategy n ignore_older_secs: 600

/shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 533 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 546 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 554 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 558 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 562 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 564 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log (deleted) 1 /usr/local/bin/vector 566 /shared/applogs/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb/oms-agt-batch1-agt-glb-grp-3-7c954df77d-ztcmb.log

amanbhaskar92 commented 1 year ago

@jszwedko @dsmith3197 Can you please suggest here , how to get rid of file descriptors of rotated files ?

jszwedko commented 1 year ago

@jszwedko @dsmith3197 Can you please suggest here , how to get rid of file descriptors of rotated files ?

Currently there isn't a mechanism to have Vector ignore rotated files. It will continue trying to read them until it hits the end of the file.

neuronull commented 1 year ago

Related: https://github.com/vectordotdev/vector/issues/18864

We have an issue open to track the feature request for adding support for config option to not read to EOF: https://github.com/vectordotdev/vector/issues/18863

Closing this issue in lieu of that one.