Open arve0 opened 2 years ago
Viewed from host, uid/gid seems correct:
❯ oc debug node/domstoltestocpin101
Starting pod/domstoltestocpin101-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.242.158.20
If you don't see a command prompt, try pressing enter.
sh-4.4# ls -ld /host/var/lib/vector
drwxrwxr-x. 2 3000 3000 6 Nov 7 13:43 /host/var/lib/vector
In my case, the error message is as below.
2023-01-11T06:56:35.919540Z ERROR vector::topology: Configuration error. error=Source "task_log": Could not create subdirectory "task_log" inside of data dir "/var/lib/vector/": Read-only file system (os error 30)
This because of PodSpec' volumeMount error. You can check your volumeMount if readOnly add or post your pod yaml.
Source Code from here
Trying to reproduce this today with the following config (updated for latest Helm and Vector versions):
role: Agent
service:
enabled: false
serviceHeadless:
enabled: false
customConfig:
data_dir: "/vector-data-dir"
sources:
k8s_logs:
type: kubernetes_logs
sinks:
opensearch:
type: elasticsearch
endpoint: https://opensearch:9200
inputs:
- k8s_logs
mode: bulk
bulk:
index: "vector-%Y.%m.%d"
compression: none
auth:
strategy: basic
user: xxxxx
password: xxxxx
tls:
verify_certificate: false
verify_hostname: false
I don't see any error when running locally on colima
:
❯ kubectl logs pod/vector-6zh9r
2023-03-09T14:23:54.444835Z INFO vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2023-03-09T14:23:54.448176Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-03-09T14:23:54.448602Z INFO vector::app: Loading configs. paths=["/etc/vector"]
2023-03-09T14:23:54.499656Z INFO source{component_kind="source" component_id=k8s_logs component_type=kubernetes_logs component_name=k8s_logs}: vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="colima"
2023-03-09T14:23:54.587269Z INFO source{component_kind="source" component_id=k8s_logs component_type=kubernetes_logs component_name=k8s_logs}: vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["**/*.gz", "**/*.tmp"]
2023-03-09T14:23:54.589787Z WARN vector::sinks::elasticsearch::common: DEPRECATION, use of deprecated option `endpoint`. Please use `endpoints` option instead.
2023-03-09T14:23:54.594123Z WARN vector_core::tls::settings: The `verify_certificate` option is DISABLED, this may lead to security vulnerabilities.
2023-03-09T14:23:54.594898Z WARN vector_core::tls::settings: The `verify_hostname` option is DISABLED, this may lead to security vulnerabilities.
I suspect this is due to restrictions imposed by OpenShift. Could you confirm you're still seeing this issue after upgrading to latest?
I suspect this is due to restrictions imposed by OpenShift.
I can confirm that. When adding a SecurityContextConstraint with correct permissions, it works.
Would you like me to contribute back the SecurityContextConstraint under a flag, say openshift: true
?
I suspect this is due to restrictions imposed by OpenShift.
I can confirm that. When adding a SecurityContextConstraint with correct permissions, it works.
Would you like me to contribute back the SecurityContextConstraint under a flag, say
openshift: true
?
That'd be great - I don't have too much experience with OpenShift, but if that's a normal/expected resource to create in OS clusters that seems good.
I suspect this is due to restrictions imposed by OpenShift.
I can confirm that. When adding a SecurityContextConstraint with correct permissions, it works.
Would you like me to contribute back the SecurityContextConstraint under a flag, say
openshift: true
?
What was the fix? I tried with a custom privileged scc and for troubleshooting set runAsUser to 0 but I still get the permission errors.
Edit: I had to set privileged: true
in the container security context for it to work.
Edit: I had to set privileged: true in the container security context for it to work.
Correct. I set it in values to chart:
securityContext:
privileged: true
Then added SCC, Role and RoleBinding on the side:
# vector trenger priviligert tilgang for å skrive til /var/lib/vector på node.
# Kun initContainer bruker priviligert tilgang, vector-containeren kjøres som uid/guid 3000.
---
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
name: privileged-and-hostpath
annotations:
kubernetes.io/description: |
Kopiert fra restricted. Har i tillegg allowHostDirVolumePlugin=true, volumes:hostpath
og allowPrivilegedContainer=true.
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: null
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups: []
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
type: RunAsAny
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- hostPath
- persistentVolumeClaim
- projected
- secret
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: use-privileged-and-hostpath
rules:
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- privileged-and-hostpath
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: vector-can-use-privileged-and-hostpath
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: use-privileged-and-hostpath
subjects:
- kind: ServiceAccount
name: vector
I tried using SecurityContextConstraints.allowedCapabilities
without allowPrivilegedContainer
, but never got that working. Found that openshift-logging also uses allowPrivilegedContainer
, so settled with that.
Hi,
Try to avoid setting privileged: true
, because it is basically giving the vector pod root access to the underlying host.
Configure your scc to this again and remove privileged: true
:
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
Then add this in your daemonset:
- op: add
path: "/spec/template/spec/containers/0/securityContext"
value:
allowPrivilegeEscalation: false
capabilities:
add:
- CHOWN
drop:
- KILL
- DAC_OVERRIDE
- FOWNER
- NET_BIND_SERVICE
- FSETID
- SETGID
- SETUID
- SETPCAP
privileged: false
seLinuxOptions:
type: container_logwriter_t
seccompProfile:
type: RuntimeDefault
and I would suggest applying this MachineConfig
to the nodes where vector is running(with me it is on all my worker nodes):
variant: openshift
version: 4.14.0
metadata:
name: 50-selinux-file-contexts-local
labels:
machineconfiguration.openshift.io/role: worker
storage:
files:
- path: /etc/selinux/targeted/contexts/files/file_contexts.local
mode: 0644
overwrite: true
contents:
inline: |
/var/lib/vector(/.*)? system_u:object_r:container_file_t:s0
systemd:
units:
- contents: |-
[Unit]
Description=Set local SELinux file context for vector
[Service]
ExecStart=/bin/bash -c '/usr/bin/mkdir -p /var/lib/vector;restorecon -Rv /var/lib/vector'
RemainAfterExit=yes
Type=oneshot
[Install]
WantedBy=multi-user.target
enabled: true
name: set-SELinux-context-local.service
Hi! I get the error message on start:
I use the following setup:
I've tried adding an init container:
and using uid/guid/fsuid 3000 in vector:
But it still fails. Debugging the container:
Any ideas?