vectordotdev / helm-charts

Helm charts for Vector.
https://vector.dev
Mozilla Public License 2.0
111 stars 90 forks source link

"/var/lib/vector/": Read-only file system (os error 30) #226

Open kbespalov opened 2 years ago

kbespalov commented 2 years ago

Vector agent cannot start due to Read-only file system error

2022-07-08T11:14:12.522784Z ERROR vector::topology: Configuration error. error=Source "billing_golang_logs": Could not create subdirectory "billing_golang_logs" inside of data dir "/var/lib/vector/": Read-only file system (os error 30)
2022-07-08T11:14:12.522829Z ERROR vector::topology: Configuration error. error=Source "billing_python_logs": Could not create subdirectory "billing_python_logs" inside of data dir "/var/lib/vector/": Read-only file system (os error 30)

So, vector-agent is trying to create directories to store checkpoints json files, but there is no way to do that because /var/lib volume is mounted as RO:

screen

spencergilbert commented 2 years ago

Hey @kbespalov can you share the configuration you're using for Vector?

kbespalov commented 2 years ago

@spencergilbert

Chart Version: 0.13.1

here is chart values configuration, nothing exceptional - sinks/sources/trasforms, almost default.

affinity: {}
args:
- --config-dir
- /etc/vector/
autoscaling:
  customMetric: {}
  enabled: false
  maxReplicas: 10
  minReplicas: 1
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: null
command: []
commonLabels: {}
containerPorts: []
customConfig:
  sinks:
    cloudwatch:
      # ... omitted
      type: aws_cloudwatch_logs
  sources:
    billing_golang_logs:
      # ... omitted
      type: kubernetes_logs
    billing_python_logs:
      # ... omitted
      type: kubernetes_logs
  transforms:
    formatted_golang_logs:
      # ... omitted
    formatted_python_logs:
      # ... omitted
      type: remap
    merged_python_logs:
      # ... omitted
      type: reduce
dataDir: ""
dnsConfig: {}
dnsPolicy: ClusterFirst
env: []
existingConfigMaps: []
extraVolumeMounts: []
extraVolumes: []
fullnameOverride: ""
haproxy:
  affinity: {}
  autoscaling:
    customMetric: {}
    enabled: false
    maxReplicas: 10
    minReplicas: 1
    targetCPUUtilizationPercentage: 80
    targetMemoryUtilizationPercentage: null
  containerPorts: []
  customConfig: ""
  enabled: false
  existingConfigMap: ""
  extraVolumeMounts: []
  extraVolumes: []
  image:
    pullPolicy: IfNotPresent
    pullSecrets: []
    repository: haproxytech/haproxy-alpine
    tag: 2.4.17
  initContainers: []
  livenessProbe:
    tcpSocket:
      port: 1024
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  podPriorityClassName: ""
  podSecurityContext: {}
  readinessProbe:
    tcpSocket:
      port: 1024
  replicas: 1
  resources: {}
  rollWorkload: true
  securityContext: {}
  service:
    annotations: {}
    ports: []
    topologyKeys: []
    type: ClusterIP
  serviceAccount:
    annotations: {}
    automountToken: true
    create: true
    name: null
  strategy: {}
  terminationGracePeriodSeconds: 60
  tolerations: []
image:
  pullPolicy: IfNotPresent
  pullSecrets: []
  repository: timberio/vector
  sha: ""
  tag: ""
ingress:
  annotations: {}
  className: ""
  enabled: false
  hosts: []
  tls: []
initContainers: []
livenessProbe: {}
nameOverride: ""
nodeSelector: {}
persistence:
  accessModes:
  - ReadWriteOnce
  enabled: false
  existingClaim: ""
  finalizers:
  - kubernetes.io/pvc-protection
  hostPath:
    path: /var/lib/vector
  selectors: {}
  size: 10Gi
podAnnotations: {}
podDisruptionBudget:
  enabled: false
  maxUnavailable: null
  minAvailable: 1
podLabels: {}
podManagementPolicy: OrderedReady
podMonitor:
  additionalLabels: {}
  enabled: false
  honorLabels: false
  honorTimestamps: true
  jobLabel: app.kubernetes.io/name
  metricRelabelings: []
  path: /metrics
  port: prom-exporter
  relabelings: []
podPriorityClassName: ""
podSecurityContext: {}
psp:
  create: false
  enabled: true
rbac:
  create: true
readinessProbe: {}
replicas: 1
resources: {}
role: Agent
rollWorkload: true
secrets:
  generic: {}
securityContext: {}
service:
  annotations: {}
  enabled: false
  ports: []
  topologyKeys: []
  type: ClusterIP
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: .....
  automountToken: true
  create: true
  name: vector-logging-agent
terminationGracePeriodSeconds: 60
tolerations: []
updateStrategy: {}

Maybe something is wrong with the pod security policy?

spencergilbert commented 2 years ago

Interesting - I thought I had included some logic around this - but in your customConfig you can set a different data_dir, Vector defaults to /var/lib/vector (which as you pointed out is RO in the mounts). The default configs set this to /vector-data-dir to avoid the RO filesystem.

I think we can improve this by being more specific in what we mount from /var/lib but updating the data_dir key in your config should unblock you for now.

kbespalov commented 2 years ago

Adding this parameter explicitly to the settings solved my problem. Thank you!

# values yaml
customConfig:
  data_dir: "/vector-data-dir" 
spencergilbert commented 2 years ago

I'm going to keep this open as I think we can improve our defaults to be more specific and cause less issues 😄

tuananhnguyen-ct commented 2 years ago

Will just uncommenting data_dir: "/vector-data-dir" in customConfig will fix this issue, or do you want to plan for further improvement?

spencergilbert commented 2 years ago

Will just uncommenting data_dir: "/vector-data-dir" in customConfig will fix this issue, or do you want to plan for further improvement?

I'd like to tighten up the mount config so we don't mount the entirety of /var/lib to access kubernetes logs, which would resolve the default data dir issue. Today choosing a path that isn't under /var/lib avoids the issue.

anthr76 commented 1 year ago

This is still an issue :(