wazuh / wazuh-kubernetes

Wazuh - Wazuh Kubernetes
https://wazuh.com/
GNU General Public License v2.0
250 stars 155 forks source link

wazuh-manager-master and worker pods are in crashloopbackoff state after following the local-env deployment #329

Open chasegame-alpha opened 1 year ago

chasegame-alpha commented 1 year ago

I have cloned wazuh-kubernetes repository, generated certificates and deployed it on my kubernetes cluster. Changed StorageClass to nfs-client, because i have deployed nfs-subdir-external-provisioner in the namespace for dynamic provisioning of persistent volumes. The wazuh pods are in crashloopbackoff state after deploying. When i commented out volumesmounts with name wazuh-manager-master and wazuh-manager-worker the pods are running and the API is connected to dashboard.

teddytpc1 commented 1 year ago

Hi @chasegame-alpha. The StorageClass provisioner should be compatible with dynamic provisioning in order to create the volumes. If it is not you will need to manually create a PV and PVC using a custom manifest. Here is an example:

Make sure to use the correct provisioner.

rst-ack commented 1 year ago

I'm having the same issue, also using nfs-subdir-external-provisioner as my storage provisioner.

I modified the envs/local-env/storage-class.yaml patch:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: wazuh-storage
provisioner: cluster.local/nfs-subdir-external-provisioner
mountOptions:
- nfsvers=3

Deployed using kubectl apply -k envs/local-env/, and the wazuh-indexer-0, wazuh-manager-master-0, and wazuh-manager-woker-0 pods end up in the CrashLoopBackOff state.

I'm a kubernetes neophyte so not sure where to go from here.

Hi @chasegame-alpha. The StorageClass provisioner should be compatible with dynamic provisioning in order to create the volumes. If it is not you will need to manually create a PV and PVC using a custom manifest. Here is an example:

@teddytpc1 so I understand correctly, are you suggesting that the nfs-subdir-external-provisioner is not compatible with dynamic provisioning?

rst-ack commented 1 year ago

Seems adding the mountOptions to the storage class patch allowed the pods to start, but now errors like these are showing in the logs for the wazuh-manager-master-0 pod, which eventually reverts to the CrashLoopBackOff state:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 0-wazuh-init: executing...
/var/ossec/data_tmp/permanent/var/ossec/api/configuration/
The path /var/ossec/api/configuration is already mounted
/var/ossec/data_tmp/permanent/var/ossec/etc/
The path /var/ossec/etc is already mounted
/var/ossec/data_tmp/permanent/var/ossec/logs/
The path /var/ossec/logs is already mounted
/var/ossec/data_tmp/permanent/var/ossec/queue/
The path /var/ossec/queue is already mounted
/var/ossec/data_tmp/permanent/var/ossec/agentless/
The path /var/ossec/agentless is already mounted
/var/ossec/data_tmp/permanent/var/ossec/var/multigroups/
The path /var/ossec/var/multigroups is empty, skiped
/var/ossec/data_tmp/permanent/var/ossec/integrations/
The path /var/ossec/integrations is already mounted
/var/ossec/data_tmp/permanent/var/ossec/active-response/bin/
The path /var/ossec/active-response/bin is already mounted
/var/ossec/data_tmp/permanent/var/ossec/wodles/
The path /var/ossec/wodles is already mounted
/var/ossec/data_tmp/permanent/etc/filebeat/
The path /etc/filebeat is already mounted
Updating /var/ossec/etc/internal_options.conf
Updating /var/ossec/integrations/pagerduty
Updating /var/ossec/integrations/slack
Updating /var/ossec/integrations/slack.py
Updating /var/ossec/integrations/virustotal
Updating /var/ossec/integrations/virustotal.py
Updating /var/ossec/active-response/bin/default-firewall-drop
Updating /var/ossec/active-response/bin/disable-account
Updating /var/ossec/active-response/bin/firewalld-drop
Updating /var/ossec/active-response/bin/firewall-drop
Updating /var/ossec/active-response/bin/host-deny
Updating /var/ossec/active-response/bin/ip-customblock
Updating /var/ossec/active-response/bin/ipfw
Updating /var/ossec/active-response/bin/kaspersky.py
Updating /var/ossec/active-response/bin/kaspersky
Updating /var/ossec/active-response/bin/npf
Updating /var/ossec/active-response/bin/wazuh-slack
Updating /var/ossec/active-response/bin/pf
Updating /var/ossec/active-response/bin/restart-wazuh
Updating /var/ossec/active-response/bin/restart.sh
Updating /var/ossec/active-response/bin/route-null
Updating /var/ossec/agentless/sshlogin.exp
Updating /var/ossec/agentless/ssh_pixconfig_diff
Updating /var/ossec/agentless/ssh_asa-fwsmconfig_diff
Updating /var/ossec/agentless/ssh_integrity_check_bsd
Updating /var/ossec/agentless/main.exp
Updating /var/ossec/agentless/su.exp
Updating /var/ossec/agentless/ssh_integrity_check_linux
Updating /var/ossec/agentless/register_host.sh
Updating /var/ossec/agentless/ssh_generic_diff
Updating /var/ossec/agentless/ssh_foundry_diff
Updating /var/ossec/agentless/ssh_nopass.exp
Updating /var/ossec/agentless/ssh.exp
Updating /var/ossec/wodles/utils.py
Updating /var/ossec/wodles/aws/aws-s3
Updating /var/ossec/wodles/aws/aws-s3.py
Updating /var/ossec/wodles/azure/azure-logs
Updating /var/ossec/wodles/azure/azure-logs.py
Updating /var/ossec/wodles/docker/DockerListener
Updating /var/ossec/wodles/docker/DockerListener.py
Updating /var/ossec/wodles/gcloud/gcloud
Updating /var/ossec/wodles/gcloud/gcloud.py
Updating /var/ossec/wodles/gcloud/integration.py
Updating /var/ossec/wodles/gcloud/tools.py
find: '/proc/312/task/312/fd/5': No such file or directory
find: '/proc/312/task/312/fdinfo/5': No such file or directory
find: '/proc/312/fd/6': No such file or directory
find: '/proc/312/fdinfo/6': No such file or directory
find: '/proc/313/task/313/fd/5': No such file or directory
find: '/proc/313/task/313/fdinfo/5': No such file or directory
find: '/proc/313/fd/6': No such file or directory
find: '/proc/313/fdinfo/6': No such file or directory
Identified Wazuh configuration files to mount...
'/wazuh-config-mount/etc/ossec.conf' -> '/var/ossec/etc/ossec.conf'
'/wazuh-config-mount/etc/authd.pass' -> '/var/ossec/etc/authd.pass'
[cont-init.d] 0-wazuh-init: exited 0.
[cont-init.d] 1-config-filebeat: executing...
Customize Elasticsearch ouput IP
Configuring username.
Configuring password.
Configuring SSL verification mode.
Configuring Certificate Authorities.
Configuring SSL Certificate.
Configuring SSL Key.
[cont-init.d] 1-config-filebeat: exited 0.
[cont-init.d] 2-manager: executing...
Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1245, in _execute_context
    self.dialect.do_execute(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: disk I/O error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/ossec/framework/scripts/create_user.py", line 72, in <module>
    create_rbac_db()
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.4.0-py3.9.egg/wazuh/rbac/orm.py", line 2454, in create_rbac_db
    _Base.metadata.create_all(_engine)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 4315, in create_all
    bind._run_visitor(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2049, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1618, in _run_visitor
    visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/sql/visitors.py", line 138, in traverse_single
    return meth(obj, **kw)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/sql/ddl.py", line 754, in visit_metadata
    [t for t in tables if self._can_create_table(t)]
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/sql/ddl.py", line 754, in <listcomp>
    [t for t in tables if self._can_create_table(t)]
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/sql/ddl.py", line 730, in _can_create_table
    return not self.checkfirst or not self.dialect.has_table(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/dialects/sqlite/base.py", line 1598, in has_table
    info = self._get_table_pragma(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/dialects/sqlite/base.py", line 2063, in _get_table_pragma
    cursor = connection.execute(statement)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 976, in execute
    return self._execute_text(object_, multiparams, params)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1143, in _execute_text
    ret = self._execute_context(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    self._handle_dbapi_exception(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1245, in _execute_context
    self.dialect.do_execute(
  File "/var/ossec/framework/python/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) disk I/O error
[SQL: PRAGMA main.table_info("roles_rules")]
(Background on this error at: http://sqlalche.me/e/e3q8)

The link on the last line doesn't offer any helpful or relevant information unfortunately.

FWIW, there's nothing wrong that I can see with my NFS server; other workloads are using the nfs-subdir-external-provisioner without issue.

I tried adding ReadWriteMany to the access modes across the various manifests in case it would make a difference, but unfortunately it didn't help.

chronoglass commented 1 year ago

Perhaps somewhat related, I am using the smb csi driver and get the same error if I don't mount with the nobrl flag enabled. This might be related: https://stackoverflow.com/questions/7573301/sqlite3-nfs-mount-issue-with-locking-can-i-use-something-like-cifs-nobrl

though it appears to be an issue for sqlite to use network storage.. so there might need to be a better answer. you also may, or may not end up at the next gate:

Started wazuh-authd... wazuh-db did not start correctly. [cont-init.d] 2-manager: exited 1. [cont-init.d] done. [services.d] starting services starting Filebeat [services.d] done. 2023/07/28 02:37:11 wazuh-csyslogd: INFO: Remote syslog server not configured. Clean exit. 2023/07/28 02:37:11 wazuh-dbd: INFO: Database not configured. Clean exit. 2023/07/28 02:37:11 wazuh-integratord: INFO: Remote integrations not configured. Clean exit. 2023/07/28 02:37:12 wazuh-agentlessd: INFO: Not configured. Exiting. 2023/07/28 02:37:12 wazuh-authd: INFO: Started (pid: 472). 2023/07/28 02:37:12 wazuh-authd: INFO: Accepting connections on port 1515. Using password specified on file: etc/authd.pass 2023/07/28 02:37:12 wazuh-authd: INFO: Setting network timeout to 1.000000 sec. 2023/07/28 02:37:13 wazuh-authd: ERROR: Unable to bind to socket 'queue/sockets/auth': 'Operation not permitted'. Closing local server. 2023/07/28 02:37:13 wazuh-db: INFO: Started (pid: 486). 2023/07/28 02:37:13 wazuh-db: CRITICAL: Unable to bind to socket 'queue/db/wdb': 'Operation not permitted'. Closing local server. 2023-07-28T02:37:23.535Z INFO instance/beat.go:645 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] 2023-07-28T02:37:23.647Z INFO instance/beat.go:653 Beat ID: 928663aa-d423-46b7-880e-71303bab6676 2023-07-28T02:37:23.655Z INFO [seccomp] seccomp/seccomp.go:124 Syscall filter successfully installed 2023-07-28T02:37:23.655Z INFO [beat] instance/beat.go:981 Beat info {"system_info": {"beat": {"path": {"config": "/etc/filebeat", "data": "/var/lib/filebeat", "home": "/usr/share/filebeat", "logs": "/var/log/filebeat"}, "type": "filebeat", "uuid": "928663aa-d423-46b7-880e-71303bab6676"}}} 2023-07-28T02:37:23.656Z INFO [beat] instance/beat.go:990 Build info {"system_info": {"build": {"commit": "aacf9ecd9c494aa0908f61fbca82c906b16562a8", "libbeat": "7.10.2", "time": "2021-01-12T22:10:33.000Z", "version": "7.10.2"}}} 2023-07-28T02:37:23.656Z INFO [beat] instance/beat.go:993 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.14.12"}}} 2023-07-28T02:37:23.657Z INFO [beat] instance/beat.go:997 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2023-05-06T20:40:55Z","containerized":false,"name":"wazuh-manager-master-0","ip":["127.0.0.1/8","::1/128","10.4.220.116/32","fe80::cca:a8ff:fe0e:d02c/64"],"kernel_version":"5.10.0-21-amd64","mac":["0e:ca:a8:0e:d0:2c"],"os":{"family":"debian","platform":"ubuntu","name":"Ubuntu","version":"20.04.6 LTS (Focal Fossa)","major":20,"minor":4,"patch":6,"codename":"focal"},"timezone":"UTC","timezone_offset_sec":0}}} 2023-07-28T02:37:23.709Z INFO [beat] instance/beat.go:1026 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"effective":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/run/s6/services/filebeat", "exe": "/usr/share/filebeat/bin/filebeat", "name": "filebeat", "pid": 552, "ppid": 548, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2023-07-28T02:37:23.210Z"}}} 2023-07-28T02:37:23.710Z INFO instance/beat.go:299 Setup Beat: filebeat; Version: 7.10.2 2023-07-28T02:37:23.711Z INFO eslegclient/connection.go:99 elasticsearch url: https://wazuh-indexer-0.wazuh-indexer:9200 2023-07-28T02:37:23.712Z INFO [publisher] pipeline/module.go:113 Beat name: wazuh-manager-master-0 2023-07-28T02:37:23.715Z INFO beater/filebeat.go:117 Enabled modules/filesets: wazuh (alerts), () 2023-07-28T02:37:23.716Z INFO instance/beat.go:455 filebeat start running. 2023-07-28T02:37:23.884Z INFO memlog/store.go:119 Loading data file of '/var/lib/filebeat/registry/filebeat' succeeded. Active transaction id=0 2023-07-28T02:37:23.884Z INFO memlog/store.go:124 Finished loading transaction log file for '/var/lib/filebeat/registry/filebeat'. Active transaction id=0 2023-07-28T02:37:23.910Z INFO [registrar] registrar/registrar.go:109 States Loaded from registrar: 0 2023-07-28T02:37:23.910Z INFO [crawler] beater/crawler.go:71 Loading Inputs: 1 2023-07-28T02:37:23.911Z INFO log/input.go:157 Configured paths: [/var/ossec/logs/alerts/alerts.json] 2023-07-28T02:37:23.911Z INFO [crawler] beater/crawler.go:141 Starting input (ID: 9132358592892857476) 2023-07-28T02:37:23.911Z INFO [crawler] beater/crawler.go:108 Loading and starting Inputs completed. Enabled inputs: 1