Closed maratsal closed 1 year ago
Thanks @maratsal
root squash could very much be the root cause, our db init containers are a mess of trying to fix pvc permissions using root group (gid 0) and hacks to overcome postgres hugepages issue.
I wonder if you manually went to that pvc filesystem as root (or as current owner uid 99) and did a chown -R 10001:0 <db-volume-path>
- would the init container run ok?
@dannyzaken @jackyalbo and @baum would probably know how to fix this problem.
cannot change ownership with any ID (tried 99 and 10001), but have no problems with creating files and folder with any ID (please see Manual permissions change test below).
So maybe if PVC is from NFS storage we should not be running that permissions change in this code and having additional check like below:
prepare_postgres_pv() {
local dir="/var/lib/pgsql"
local dir_id=$(stat -c '%u' ${dir})
+ local dir_type=$(stat -f -c %T ${dir})
local current_id=$(id -u)
# change ownership and permissions of mongo db path
- if [ "${dir_id}" != "${current_id}" ]
+ if [ "${dir_id}" != "${current_id}" ] && [ "${dir_type}" != "nfs" ]
then
echo "uid change has been identified - will change from uid: ${dir_id} to new uid: ${current_id}"
time ${KUBE_PV_CHOWN} postgres ${current_id}
fi
}
Manual permissions change test:
[myuser@my-linux-host ~]$ oc rsh busybox-bbc4dc98-8gfnc # user id 99
~ $ id
uid=99(99) gid=0(root)
~ $ ls -la
total 20
dr-xr-xr-x 1 root root 62 Aug 1 22:58 .
dr-xr-xr-x 1 root root 62 Aug 1 22:58 ..
drwxr-xr-x 2 root root 12288 Jul 29 01:32 bin
drwxr-xr-x 5 root root 360 Aug 1 22:58 dev
drwxr-xr-x 1 root root 66 Aug 1 22:58 etc
drwxr-xr-x 2 nobody nobody 6 Jul 29 01:32 home
dr-xr-xr-x 309 root root 0 Aug 1 22:58 proc
drwx------ 2 root root 6 Jul 29 01:32 root
drwxr-xr-x 3 root root 42 Aug 1 22:58 run
dr-xr-xr-x 13 root root 0 Jul 29 14:12 sys
drwxrwxrwx 2 99 99 4096 Aug 1 22:58 tmp
drwxr-xr-x 3 root root 18 Jul 29 01:32 usr
drwxr-xr-x 1 root root 17 Aug 1 22:58 var
~ $ df -h /tmp
Filesystem Size Used Available Use% Mounted on
10.32.10.245:/trident_qtree_pool_trident_GXYUPUKEEO/trident_pvc_bc950669_6230_400a_9c97_0150658c8450
50.0G 0 50.0G 0% /tmp
~ $ chown 10001 /tmp
chown: /tmp: Operation not permitted
~ $ mkdir /tmp/folder-id-99
~ $ ls -la /tmp/
total 8
drwxrwxrwx 3 99 99 4096 Aug 1 22:59 .
dr-xr-xr-x 1 root root 62 Aug 1 22:58 ..
drwxr-xr-x 2 99 99 4096 Aug 1 22:59 folder-id-99
~ $ chown 10001 /tmp/folder-id-99/
chown: /tmp/folder-id-99/: Operation not permitted
~ $
command terminated with exit code 1
[myuser@my-linux-host ~]$ oc edit deploy busybox # change user id to 10001
deployment.apps/busybox edited
[myuser@my-linux-host ~]$ oc get pods | grep busybox
busybox-69fcd8b488-mpvsc 1/1 Running 0 12s
busybox-bbc4dc98-8gfnc 1/1 Terminating 0 2m2s
[myuser@my-linux-host ~]$ oc rsh busybox-69fcd8b488-mpvsc
~ $ id
uid=10001(10001) gid=0(root)
~ $ cd /tmp
/tmp $ ls -la
total 8
drwxrwxrwx 3 99 99 4096 Aug 1 22:59 .
dr-xr-xr-x 1 root root 62 Aug 1 23:00 ..
drwxr-xr-x 2 99 99 4096 Aug 1 22:59 folder-id-99
/tmp $ mkdir folder-id-10001
/tmp $ touch file-id-10001
/tmp $ ls -la
total 12
drwxrwxrwx 4 99 99 4096 Aug 1 23:01 .
dr-xr-xr-x 1 root root 62 Aug 1 23:00 ..
-rw-r--r-- 1 10001 99 0 Aug 1 23:01 file-id-10001
drwxr-xr-x 2 10001 99 4096 Aug 1 23:01 folder-id-10001
drwxr-xr-x 2 99 99 4096 Aug 1 22:59 folder-id-99
/tmp $ rm -rf *
/tmp $ ls -la
total 4
drwxrwxrwx 2 99 99 4096 Aug 1 23:01 .
dr-xr-xr-x 1 root root 62 Aug 1 23:00 ..
/tmp $
[myuser@my-linux-host ~]$
So maybe if PVC is from NFS storage we should not be running that permissions change
I am not sure if not running it will work, because we need to have permissions to write to all those files to run the database...
here is more context on this issue (notice answers 2 and 3 as well) -
I just tested it and it could start with no issues with my suggested changes (mounted modified script from the configmap):
[myuser@my-linux-host ~]$ oc logs -f --all-containers noobaa-db-pg-0
+ set -x
+ export PGDATA=/var/lib/pgsql/data/userdata
+ PGDATA=/var/lib/pgsql/data/userdata
+ '[' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
+ p=/opt/rh/rh-postgresql12/root/usr/bin/postgres
+ '[' '!' -x /opt/rh/rh-postgresql12/root/usr/bin/postgres ']'
+ p=/usr/bin/postgres
+ mv /usr/bin/postgres /usr/bin/postgres.orig
+ echo exec /usr/bin/postgres.orig '"$@"' -c huge_pages=off
+ chmod 755 /usr/bin/postgres
+ sed -i -e 's/^\(postgres:[^:]\):[0-9]*:[0-9]*:/\1:10001:0:/' /etc/passwd
+ sed -i -e 's/^exec.*$/exit 0/' -e 's/^pg_ctl\sstart.*/pg_ctl start || true/' /usr/bin/run-postgresql
+ su postgres -c 'bash -x /usr/bin/run-postgresql'
+ export ENABLE_REPLICATION=false
+ ENABLE_REPLICATION=false
+ set -eu
++ cgroup-limits
+ export_vars='MAX_MEMORY_LIMIT_IN_BYTES=9223372036854775807
MEMORY_LIMIT_IN_BYTES=524288000
NUMBER_OF_CORES=4'
+ export MAX_MEMORY_LIMIT_IN_BYTES=9223372036854775807 MEMORY_LIMIT_IN_BYTES=524288000 NUMBER_OF_CORES=4
+ MAX_MEMORY_LIMIT_IN_BYTES=9223372036854775807
+ MEMORY_LIMIT_IN_BYTES=524288000
+ NUMBER_OF_CORES=4
+ source /usr/share/container-scripts/postgresql/common.sh
++ export POSTGRESQL_MAX_CONNECTIONS=100
++ POSTGRESQL_MAX_CONNECTIONS=100
++ export POSTGRESQL_MAX_PREPARED_TRANSACTIONS=0
++ POSTGRESQL_MAX_PREPARED_TRANSACTIONS=0
++ [[ '' == \t\r\u\e ]]
++ [[ -z 524288000 ]]
++ shared_buffers_computed=125MB
++ effective_cache=250MB
++ export POSTGRESQL_SHARED_BUFFERS=125MB
++ POSTGRESQL_SHARED_BUFFERS=125MB
++ export POSTGRESQL_EFFECTIVE_CACHE_SIZE=250MB
++ POSTGRESQL_EFFECTIVE_CACHE_SIZE=250MB
++ export POSTGRESQL_RECOVERY_FILE=/var/lib/pgsql/openshift-custom-recovery.conf
++ POSTGRESQL_RECOVERY_FILE=/var/lib/pgsql/openshift-custom-recovery.conf
++ export POSTGRESQL_CONFIG_FILE=/var/lib/pgsql/openshift-custom-postgresql.conf
++ POSTGRESQL_CONFIG_FILE=/var/lib/pgsql/openshift-custom-postgresql.conf
++ postinitdb_actions=
++ shopt -s dotglob
++ shopt -s extglob
+ set_pgdata
+ export PGDATA=/var/lib/pgsql/data/userdata
+ PGDATA=/var/lib/pgsql/data/userdata
+ mkdir -p /var/lib/pgsql/data/userdata
+ '[' -e /var/lib/pgsql/data/PG_VERSION ']'
+ chmod 700 /var/lib/pgsql/data/userdata
+ process_extending_files /opt/app-root/src/postgresql-pre-start /usr/share/container-scripts/postgresql/pre-start
+ local filename dir
++ get_matched_files '*.sh' /opt/app-root/src/postgresql-pre-start /usr/share/container-scripts/postgresql/pre-start
++ sort -u
++ local 'pattern=*.sh' dir
++ shift
++ for dir in "$@"
++ test -d /opt/app-root/src/postgresql-pre-start
++ continue
++ for dir in "$@"
++ test -d /usr/share/container-scripts/postgresql/pre-start
++ continue
+ read filename
+ for dir in "$@"
+ local file=/opt/app-root/src/postgresql-pre-start/
+ test -f /opt/app-root/src/postgresql-pre-start/
+ for dir in "$@"
+ local file=/usr/share/container-scripts/postgresql/pre-start/
+ test -f /usr/share/container-scripts/postgresql/pre-start/
+ read filename
+ check_env_vars
+ [[ -v POSTGRESQL_USER ]]
+ [[ -v POSTGRESQL_USER ]]
+ [[ -v POSTGRESQL_PASSWORD ]]
+ [[ -v POSTGRESQL_DATABASE ]]
+ '[' 6 -le 63 ']'
+ '[' 6 -le 63 ']'
+ postinitdb_actions+=,simple_db
+ '[' -v POSTGRESQL_ADMIN_PASSWORD ']'
+ '[' -v POSTGRESQL_MIGRATION_REMOTE_HOST -a -v POSTGRESQL_MIGRATION_ADMIN_PASSWORD ']'
+ case "$postinitdb_actions" in
+ generate_passwd_file
++ id -u
+ export USER_ID=10001
+ USER_ID=10001
++ id -g
+ export GROUP_ID=0
+ GROUP_ID=0
+ grep -v -e '^postgres' -e '^10001' /etc/passwd
+ echo 'postgres:x:10001:0:PostgreSQL Server:/var/lib/pgsql:/bin/bash'
+ export LD_PRELOAD=libnss_wrapper.so
+ LD_PRELOAD=libnss_wrapper.so
+ export NSS_WRAPPER_PASSWD=/var/lib/pgsql/passwd
+ NSS_WRAPPER_PASSWD=/var/lib/pgsql/passwd
+ export NSS_WRAPPER_GROUP=/etc/group
+ NSS_WRAPPER_GROUP=/etc/group
+ generate_postgresql_config
+ envsubst
+ '[' false == true ']'
+ should_hack_data_sync_retry
++ uname -p
+ '[' x86_64 == x86_64 ']'
+ return 1
+ shopt -s nullglob
+ PG_INITIALIZED=false
+ '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
+ initialize_database
+ initdb_wrapper initdb
+ LANG=en_US.utf8
+ initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locales
COLLATE: C
CTYPE: en_US.utf8
MESSAGES: en_US.utf8
MONETARY: en_US.utf8
NUMERIC: en_US.utf8
TIME: en_US.utf8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/pgsql/data/userdata -l logfile start
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
+ cat
+ cat
+ PG_INITIALIZED=:
+ pg_ctl start
waiting for server to start....2022-08-02 14:18:52.849 UTC [39] LOG: starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-08-02 14:18:52.849 UTC [39] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-08-02 14:18:52.849 UTC [39] LOG: listening on IPv6 address "::", port 5432
2022-08-02 14:18:52.856 UTC [39] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-02 14:18:52.862 UTC [39] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-02 14:18:52.882 UTC [39] LOG: redirecting log output to logging collector process
2022-08-02 14:18:52.882 UTC [39] HINT: Future log output will appear in directory "log".
done
server started
+ pg_isready
/var/run/postgresql:5432 - accepting connections
+ :
+ process_extending_files /opt/app-root/src/postgresql-init /usr/share/container-scripts/postgresql/init
+ local filename dir
++ get_matched_files '*.sh' /opt/app-root/src/postgresql-init /usr/share/container-scripts/postgresql/init
++ local 'pattern=*.sh' dir
++ shift
++ for dir in "$@"
++ test -d /opt/app-root/src/postgresql-init
++ continue
++ for dir in "$@"
++ test -d /usr/share/container-scripts/postgresql/init
++ continue
++ sort -u
+ read filename
+ for dir in "$@"
+ local file=/opt/app-root/src/postgresql-init/
+ test -f /opt/app-root/src/postgresql-init/
+ for dir in "$@"
+ local file=/usr/share/container-scripts/postgresql/init/
+ test -f /usr/share/container-scripts/postgresql/init/
+ read filename
+ migrate_db
+ test ,simple_db = ,migration
+ return 0
+ create_users
+ [[ ,,simple_db, = *,simple_db,* ]]
+ createuser noobaa
+ createdb --owner=noobaa nbcore
+ '[' -v POSTGRESQL_MASTER_USER ']'
+ process_extending_files /opt/app-root/src/postgresql-start /usr/share/container-scripts/postgresql/start
+ local filename dir
++ get_matched_files '*.sh' /opt/app-root/src/postgresql-start /usr/share/container-scripts/postgresql/start
++ local 'pattern=*.sh' dir
++ shift
++ for dir in "$@"
++ test -d /opt/app-root/src/postgresql-start
++ continue
waiting for server to start....2022-08-02 14:18:56.186 UTC [22] LOG: starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-08-02 14:18:56.189 UTC [22] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
++ for dir in "$@"
++ test -d /usr/share/container-scripts/postgresql/start
2022-08-02 14:18:56.196 UTC [22] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-02 14:18:56.282 UTC [22] LOG: redirecting log output to logging collector process
2022-08-02 14:18:56.282 UTC [22] HINT: Future log output will appear in directory "log".
done
++ find -L /usr/share/container-scripts/postgresql/start -maxdepth 1 -type f -name '*.sh' -printf '%f\n'
++ sort -u
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
+ read filename
server started
/var/run/postgresql:5432 - accepting connections
+ for dir in "$@"
+ local file=/opt/app-root/src/postgresql-start/set_passwords.sh
+ test -f /opt/app-root/src/postgresql-start/set_passwords.sh
+ for dir in "$@"
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
+ local file=/usr/share/container-scripts/postgresql/start/set_passwords.sh
+ test -f /usr/share/container-scripts/postgresql/start/set_passwords.sh
+ echo '=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...'
+ source /usr/share/container-scripts/postgresql/start/set_passwords.sh
ALTER ROLE
++ [[ ,,simple_db, = *,simple_db,* ]]
waiting for server to shut down.... done
++ _psql --set=username=noobaa --set=password=SeX263+iGcHjJg==
++ psql --set ON_ERROR_STOP=1 --set=username=noobaa --set=password=SeX263+iGcHjJg==
server stopped
Starting server...
2022-08-02 14:18:56.553 UTC [1] LOG: starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-08-02 14:18:56.553 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
ALTER ROLE
++ '[' -v POSTGRESQL_MASTER_USER ']'
++ '[' -v POSTGRESQL_ADMIN_PASSWORD ']'
2022-08-02 14:18:56.553 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-08-02 14:18:56.558 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-02 14:18:56.562 UTC [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
+ set -e
+ break
+ read filename
+ pg_ctl stop
waiting for server to shut down.... done
2022-08-02 14:18:56.629 UTC [1] LOG: redirecting log output to logging collector process
2022-08-02 14:18:56.629 UTC [1] HINT: Future log output will appear in directory "log".
server stopped
Starting server...
+ unset_env_vars
+ unset POSTGRESQL_DATABASE POSTGRESQL_USER POSTGRESQL_PASSWORD POSTGRESQL_ADMIN_PASSWORD
+ echo 'Starting server...'
+ exit 0
^C
[myuser@my-linux-host ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
noobaa-core-0 1/1 Running 0 18h
noobaa-db-pg-0 1/1 Running 0 56s
noobaa-operator-698c99d7fc-qjvj2 1/1 Running 0 3d20h
ocs-metrics-exporter-5594fd8456-v6xvh 1/1 Running 0 3d23h
ocs-operator-74f7b6dcf-m7swl 1/1 Running 0 3d23h
odf-console-dc88447b5-xnf4t 1/1 Running 0 3d23h
odf-operator-controller-manager-5b66b7969f-kvfp8 2/2 Running 0 3d23h
rook-ceph-operator-5f68dc6b44-j2sf7 1/1 Running 0 3d23h
[myuser@my-linux-host ~]$
I now see the same problem on podman machine + kind clsuter.
$ kubectl logs noobaa-db-pg-0 -c init
uid change has been identified - will change from uid: 0 to new uid: 10001
setting permissions of /var/lib/pgsql for user 10001
Error:got error when changing ownership of /var/lib/pgsql Error: Operation not permitted
@dannyzaken what is our direction here??
I feel like my proposed code should work just fine, because we have permissions on group level and that's why we don't have problems with writing to the NFS share with any user as I posted above.
here is recommendations around it from Red Hat:
Adapting Containerfiles for OpenShift When you write or change a Containerfile that builds an image to run on an OpenShift cluster, you need to address the following:
- Directories and files that are read from or written to by processes in the container should be owned by the root group and have group read or group write permission.
- Files that are executable should have group execute permissions.
- The processes running in the container must not listen on privileged ports (that is, ports below 1024), because they are not running as privileged users. Adding the following RUN instruction to your Containerfile sets the directory and file permissions to allow users in the root group to access them in the container:
RUN chgrp -R 0 directory && \ chmod -R g=u directory
The user account that runs the container is always a member of the root group, hence the container can read and write to this directory. The root group does not have any special permissions (unlike the root user) which minimizes security risks with this configuration. The g=u argument from the chmod command makes the group permissions equal to the owner user permissions, which by default are read and write. You can use the g+rwX argument with the same results.
@dannyzaken will you assign this?
If I am not mistaken, I think we can get rid of all the init containers hacks and use fsGroup
and fsGroupChangePolicy
https://kubernetes.io/blog/2020/12/14/kubernetes-release-1.20-fsgroupchangepolicy-fsgrouppolicy/
@dannyzaken, @guymguym I recently picked up this issue. I am not sure if we can replace our init container hacks with fsUser
, fsGroup
and fsGroupChangePolicy
.
Kubernetes will also do exactly what we are doing (SetVolumeOwnership) which implies that it might not really help us in this particular scenario where NFS mount is exported with root_squash
hence NFS is performing operations with nobody:nogroup
- when root (AFAIK) - resulting in operation getting denied.
Maybe, instead what we can do is check in our init container if we can read and write to the directory, if we can - skip any ownership changes or else change ownership as usual.
PS: If we can't change the ownership AND we can't read/write to the FS then I think that's a good enough reason to deny pod start (which is not the case here).
Environment info
Actual behavior
real 0m0.005s user 0m0.003s sys 0m0.001s Error from server (BadRequest): container "initialize-database" in pod "noobaa-db-pg-0" is waiting to start: PodInitializing