Open cdmikechen opened 1 year ago
I also tried to change the pg directory permissions and rename it, and found that the permissions were still changed after restarting pod. Did a script or program force the folder permissions to be updated?
Before restart:
drwxrwsr-x. 3 root pmm 4096 Apr 6 03:29 alerting
drwxrwsr-x. 4 pmm pmm 4096 Apr 6 03:29 alertmanager
drwxrwsr-x. 2 root pmm 4096 Apr 6 03:30 backup
drwxrwsr-x. 13 root pmm 4096 Apr 12 08:57 clickhouse
drwxrwsr-x. 6 grafana pmm 4096 Apr 12 08:57 grafana
drwxrwsr-x. 2 pmm pmm 4096 Apr 12 08:23 logs
drwxrws---. 2 root pmm 16384 Apr 6 03:29 lost+found
drwxrwsr-x. 2 root pmm 4096 Apr 6 03:29 nginx
-rw-rw-r--. 1 root pmm 7 Apr 6 03:29 pmm-distribution
drwx--S---. 20 postgres pmm 4096 Apr 12 00:00 postgres14-bak
drwxrwsr-x. 3 pmm pmm 4096 Apr 6 03:29 prometheus
drwxrwsr-x. 3 pmm pmm 4096 Apr 6 03:29 victoriametrics
After restart pod:
drwxrwsr-x. 3 root pmm 4096 Apr 6 03:29 alerting
drwxrwsr-x. 4 pmm pmm 4096 Apr 6 03:29 alertmanager
drwxrwsr-x. 2 root pmm 4096 Apr 6 03:30 backup
drwxrwsr-x. 13 root pmm 4096 Apr 12 09:05 clickhouse
drwxrwsr-x. 6 grafana pmm 4096 Apr 12 09:05 grafana
drwxrwsr-x. 2 pmm pmm 4096 Apr 12 08:23 logs
drwxrws---. 2 root pmm 16384 Apr 6 03:29 lost+found
drwxrwsr-x. 2 root pmm 4096 Apr 6 03:29 nginx
-rw-rw-r--. 1 root pmm 7 Apr 6 03:29 pmm-distribution
drwxrws---. 20 postgres pmm 4096 Apr 12 00:00 postgres14-bak
drwxrwsr-x. 3 pmm pmm 4096 Apr 6 03:29 prometheus
drwxrwsr-x. 3 pmm pmm 4096 Apr 6 03:29 victoriametrics
Hi @cdmikechen, what version of a helm chart (pmm chart version) and repo do you use for PMM?
There are couple of things that could change those permissions - init container, storage provisioner or some update procedure.
As you said you use OKD - we don't officially support OpenShift yet as PMM requires root in the container.
Why pod was restarted? Did you run some update procedure?
Thanks, Denys
@denisok
The reason for killing pod was because I wanted to test if the pmm-server would work after a restart.
I have solved this issue so far, the problem occurred because I added a fsgroup
to the container. After removing it, pmm-server has started normally.
However, there is another problem: pmm-client will fail several times after every percona pod restart, and the pod will only work after a few error restarts. I don't understand what the reason for this is.
@cdmikechen
what version of a helm chart (pmm chart version) and repo do you use for PMM?
What logs and events shows for that pod and all containers in it?
@denisok The helm chart version is 1.2.1. Here is the pmm-client logs:
[36mINFO[0m[2023-04-21T17:37:15.410+08:00] Run setup: true Sidecar mode: true [36mcomponent[0m=entrypoint
[36mINFO[0m[2023-04-21T17:37:15.410+08:00] Starting pmm-agent for liveness probe... [36mcomponent[0m=entrypoint
[36mINFO[0m[2023-04-21T17:37:15.410+08:00] Starting 'pmm-admin setup'... [36mcomponent[0m=entrypoint
[36mINFO[0m[2023-04-21T17:37:15.552+08:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/node_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/rds_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/azure_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/vmagent [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Runner capacity set to 32. [36mcomponent[0m=runner
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/node_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/rds_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/azure_exporter [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.553+08:00] Using /usr/local/percona/pmm2/exporters/vmagent [36mcomponent[0m=main
[36mINFO[0m[2023-04-21T17:37:15.554+08:00] Window check connection time is 1.00 hour(s)
[36mINFO[0m[2023-04-21T17:37:15.554+08:00] Starting... [36mcomponent[0m=client
[31mERRO[0m[2023-04-21T17:37:15.554+08:00] Agent ID is not provided, halting. [31mcomponent[0m=client
[36mINFO[0m[2023-04-21T17:37:15.554+08:00] Starting local API server on http://0.0.0.0:7777/ ... [36mcomponent[0m=local-server/JSON
[36mINFO[0m[2023-04-21T17:37:15.556+08:00] Started. [36mcomponent[0m=local-server/JSON
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/node_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/rds_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/azure_exporter [36mcomponent[0m=setup
[36mINFO[0m[2023-04-21T17:37:15.559+08:00] Using /usr/local/percona/pmm2/exporters/vmagent [36mcomponent[0m=setup
Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
[36mINFO[0m[2023-04-21T17:37:15.887+08:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/node_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/rds_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/azure_exporter [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Using /usr/local/percona/pmm2/exporters/vmagent [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.888+08:00] Stopped. [36mcomponent[0m=local-server/JSON
[36mINFO[0m[2023-04-21T17:37:15.890+08:00] Done. [36mcomponent[0m=local-server
[36mINFO[0m[2023-04-21T17:37:15.890+08:00] Done. [36mcomponent[0m=supervisor
[36mINFO[0m[2023-04-21T17:37:15.890+08:00] Done. [36mcomponent[0m=main
Checking local pmm-agent status...
pmm-agent is not running.
[36mINFO[0m[2023-04-21T17:37:20.901+08:00] 'pmm-admin setup' exited with 0 [36mcomponent[0m=entrypoint
[36mINFO[0m[2023-04-21T17:37:20.901+08:00] Stopping pmm-agent... [36mcomponent[0m=entrypoint
[31mFATA[0m[2023-04-21T17:37:20.901+08:00] Failed to kill pmm-agent: os: process already finished [31mcomponent[0m=entrypoint
Hi. I think the pmm-client failing is much similar to this issue that I've created: https://jira.percona.com/browse/PMM-11893
I ran into the same issue with pmm-server using the helm chart version 1.2.5 and pmm-server 2.39.0. I did not set any security context in the helm chart values and the deployed sts had them empty.
I then learned our k8s cluster applies a default security context at both the pod and container level, here is the pod security context:
securityContext:
fsGroup: 1
seccompProfile:
type: RuntimeDefault
supplementalGroups:
- 1
After a restart, this is what /srv
permissions would look like:
[root@ads-pmm-stage-0-0 opt] # ls -alh /srv
total 72K
drwxrwsr-x. 13 root bin 4.0K Aug 22 04:47 .
dr-xr-xr-x. 1 root root 4.0K Aug 22 04:54 ..
drwxrwsr-x. 3 root bin 4.0K Aug 22 04:47 alerting
drwxrwsr-x. 4 pmm bin 4.0K Aug 22 04:47 alertmanager
drwxrwsr-x. 2 root bin 4.0K Aug 22 04:47 backup
drwxrwsr-x. 13 root bin 4.0K Aug 22 04:54 clickhouse
drwxrwsr-x. 6 grafana bin 4.0K Aug 22 04:54 grafana
drwxrwsr-x. 2 pmm bin 4.0K Aug 22 04:46 logs
drwxrws---. 2 root bin 16K Aug 22 04:46 lost+found
drwxrwsr-x. 2 root bin 4.0K Aug 22 04:46 nginx
-rw-rw-r--. 1 root bin 7 Aug 22 04:46 pmm-distribution
drwxrws---. 20 postgres bin 4.0K Aug 22 04:52 postgres14
drwxrwsr-x. 3 pmm bin 4.0K Aug 22 04:46 prometheus
drwxrwsr-x. 3 pmm bin 4.0K Aug 22 04:46 victoriametrics
After some trial and error, I found this helm chart value allowed pmm to survive restarts
podSecurityContext:
fsGroupChangePolicy: OnRootMismatch
The effective pod security context:
securityContext:
fsGroup: 1
fsGroupChangePolicy: OnRootMismatch
seccompProfile:
type: RuntimeDefault
supplementalGroups:
- 1
Starting fresh, this is what /srv
looked like on first boot:
[root@ads-pmm-stage-0-0 opt] # ls -alh /srv
total 72K
drwxrwsr-x. 13 root bin 4.0K Aug 22 19:48 .
dr-xr-xr-x. 1 root root 4.0K Aug 22 19:47 ..
drwxr-sr-x. 3 root bin 4.0K Aug 22 19:47 alerting
drwxrwxr-x. 4 pmm pmm 4.0K Aug 22 19:47 alertmanager
drwxr-sr-x. 2 root bin 4.0K Aug 22 19:48 backup
drwxr-sr-x. 13 root bin 4.0K Aug 22 19:47 clickhouse
drwxr-sr-x. 6 grafana render 4.0K Aug 22 19:48 grafana
drwxr-sr-x. 2 pmm pmm 4.0K Aug 22 19:47 logs
drwxrws---. 2 root bin 16K Aug 22 19:47 lost+found
drwxr-sr-x. 2 root bin 4.0K Aug 22 19:47 nginx
-rw-r--r--. 1 root bin 7 Aug 22 19:47 pmm-distribution
drwx------. 20 postgres postgres 4.0K Aug 22 19:47 postgres14
drwxr-sr-x. 3 pmm pmm 4.0K Aug 22 19:47 prometheus
drwxrwxr-x. 3 pmm pmm 4.0K Aug 22 19:47 victoriametrics
and reboot:
[root@ads-pmm-stage-0-0 opt] # ls -alh /srv
total 72K
drwxrwsr-x. 13 root bin 4.0K Aug 22 19:48 .
dr-xr-xr-x. 1 root root 4.0K Aug 22 19:53 ..
drwxr-sr-x. 3 root bin 4.0K Aug 22 19:47 alerting
drwxrwxr-x. 4 pmm pmm 4.0K Aug 22 19:47 alertmanager
drwxr-sr-x. 2 root bin 4.0K Aug 22 19:48 backup
drwxr-sr-x. 13 root bin 4.0K Aug 22 19:54 clickhouse
drwxr-sr-x. 6 grafana render 4.0K Aug 22 19:53 grafana
drwxr-sr-x. 2 pmm pmm 4.0K Aug 22 19:47 logs
drwxrws---. 2 root bin 16K Aug 22 19:47 lost+found
drwxr-sr-x. 2 root bin 4.0K Aug 22 19:47 nginx
-rw-r--r--. 1 root bin 7 Aug 22 19:47 pmm-distribution
drwx------. 20 postgres postgres 4.0K Aug 22 19:53 postgres14
drwxr-sr-x. 3 pmm pmm 4.0K Aug 22 19:47 prometheus
drwxrwxr-x. 3 pmm pmm 4.0K Aug 22 19:47 victoriametrics
I hope there are plans to support running without root
Description
I installed pxc-operator and pmm-server using helm-chart 1.12.1. When the pmm was first deployed, it started correctly. When the pod restarted, I found that the pg service was still failing.
I checked pg logs in
/src/logs
and found that the pg directory permissions is not correct.I used the following commands to change the pg directory permissions and start pg. Pg started after the first change. But after I tried to restart pod, the directory permissions were forced to change by an unknown script or program. The repetition caused the above exception.
Expected Results
Directory permission for postgres should not change, which is a mandatory restriction for pg startup.
Actual Results
pg directory permissions should not be changed.
Version
pmm-server and client 2.36. OKD 4.11
Steps to reproduce
No response
Relevant logs
I had checked
/srv
permissions and I found that:Code of Conduct