noobaa / noobaa-core

High-performance S3 application gateway to any backend - file / s3-compatible / multi-clouds / caching / replication ...
https://www.noobaa.io
Apache License 2.0
269 stars 78 forks source link

Path is not writable(/opt/app-root/src), DAS backup script is not able to create noobaa_db.backup at this location: opt/app-root/src in the pod #7132

Closed nigamshaurya14 closed 3 months ago

nigamshaurya14 commented 1 year ago

Environment info

ODF Version:


[root@hpo-app11 ~]# oc get csv -n openshift-storage
NAME                              DISPLAY                       VERSION   REPLACES                          PHASE
mcg-operator.v4.11.4              NooBaa Operator               4.11.4    mcg-operator.v4.11.3              Succeeded
ocs-operator.v4.11.4              OpenShift Container Storage   4.11.4    ocs-operator.v4.11.3              Succeeded
odf-csi-addons-operator.v4.11.4   CSI Addons                    4.11.4    odf-csi-addons-operator.v4.11.3   Succeeded
odf-operator.v4.11.4              OpenShift Data Foundation     4.11.4    odf-operator.v4.11.3              Succeeded
[root@hpo-app11 ~]#
# backup noobaa db
BACKUP_DB_FILE=noobaa_db.backup
CMD="oc exec -n openshift-storage -it noobaa-db-pg-0 -- pg_dump nbcore -f $BACKUP_DB_FILE -F custom"

actually from above command, BACKUP_DB_FILE path is not writable(/opt/app-root/src), so if we use other writable path like /tmp/noobaa_db.backup it works, without write access the script are not able to create noobaa_db.backup at this location opt/app-root/src in the pod.

Command execution with current path:


[root@hpo-app11 das-db-backup]# oc -c db exec -n openshift-storage -it noobaa-db-pg-0 -- pg_dump nbcore -f noobaa_db.backup -F custom
pg_dump: error: could not open output file "noobaa_db.backup": Permission denied
command terminated with exit code 1
Execution with new writable path:
[root@hpo-app11 das-db-backup]# oc -c db exec -n openshift-storage -it noobaa-db-pg-0 -- pg_dump nbcore -f /tmp/noobaa_db.backup -F custom
[root@hpo-app11 das-db-backup]# oc -c db exec -n openshift-storage -it noobaa-db-pg-0 -- ls -lZ /tmp
total 224
-rwx------. 1 root  root system_u:object_r:container_file_t:s0:c111,c234    291 Nov  1 04:36 ks-script-3bicx5f2
-rwx------. 1 root  root system_u:object_r:container_file_t:s0:c111,c234    701 Nov  1 04:36 ks-script-johuwdtx
-rw-r--r--. 1 10001 root system_u:object_r:container_file_t:s0:c111,c234 110524 Dec 13 11:16 noobaa_db.backup
-rw-r--r--. 1 10001 root system_u:object_r:container_file_t:s0:c111,c234 110247 Dec 13 11:00 test.db
[root@hpo-app11 das-db-backup]#

Current path:

[root@hpo-app11 das-db-backup]# oc -c db exec -n openshift-storage -it noobaa-db-pg-0 -- pwd
/opt/app-root/src
romayalon commented 1 year ago

As discussed on slack, there was no related NooBaa change that could cause this, we suspect it was due to db image change on downstream build. our suggestion is to change the path, as @baum suggested to /var/lib/pgsql, which is writable by the DB by design.

nimrod-becker commented 1 year ago

@liranmauda @dannyzaken Was this solved as part of ODF builds?

liranmauda commented 1 year ago

@nimrod-becker I think this should be solved in DAS operator, and the path should be different. talking to @romayalon it seems that it is a new postgress image that downstream is using.

As discussed on slack, there was no related NooBaa change that could cause this, we suspect it was due to db image change on downstream build. our suggestion is to change the path, as @baum suggested to /var/lib/pgsql, which is writable by the DB by design.

rkomandu commented 1 year ago

@nimrod-becker , as per our last interlock discussion, you would check the Postgres image in ODF 4.12 and the reason for this change.

If you could let us know in a day or two, we would need to work on this change accordingly in our DAS code base.

nimrod-becker commented 1 year ago

It seems 4.12 doesn't have these issues anymore (with no code changes). Can you please verify with a new deployment? If this still occurs, we need to go with Liran's suggestion

rkomandu commented 1 year ago

we will check once the system is deployed with the ODF downstream build. Asked one person in the team to check for the same but didn't get any response

nigamshaurya14 commented 1 year ago

The issue exist on ODF 4.12 as well. We have verified the same issue, and could able to re-produced on ODF 4.12.0-rc.6 build. Followed below steps:

[root@api.sps1.cp.fyre.ibm.com backup-folder]# mkdir -p das/scripts

[root@api.sps1.cp.fyre.ibm.com backup-folder]# oc cp ibm-spectrum-scale-das/$(oc -n ibm-spectrum-scale-das get pods -l app=das-endpoint -o=jsonpath='{.items[0].metadata.name}'):scripts/ /tmp/das/scripts

[root@api.sps1.cp.fyre.ibm.com backup-folder]# chmod +x /tmp/das/scripts/*

[root@api.sps1.cp.fyre.ibm.com backup-folder]# ls -ltr /tmp/das/scripts
total 12
-rwxr-xr-x 1 root root 8138 Jan 17 02:03 dasS3Restore.sh
-rwxr-xr-x 1 root root 3953 Jan 17 02:03 dasS3Backup.sh

[root@api.sps1.cp.fyre.ibm.com backup-folder]# mkdir /tmp/das/backup

ERROR :

[root@api.sps1.cp.fyre.ibm.com backup-folder]# /tmp/das/scripts/dasS3Backup.sh /tmp/das/backup
2023-01-17T02:04:38 ERROR: Failed to run pg_dump in the noobaa-db-pg-0 pod

ODF Version:

[root@api.sps1.cp.fyre.ibm.com backup-folder]# oc get csv -n openshift-storage
NAME                                         DISPLAY                       VERSION               REPLACES   PHASE
mcg-operator.v4.12.0-152.stable              NooBaa Operator               4.12.0-152.stable                Succeeded
metallb-operator.4.12.0-202301042354         MetalLB Operator              4.12.0-202301042354              Succeeded
ocs-operator.v4.12.0-152.stable              OpenShift Container Storage   4.12.0-152.stable                Succeeded
odf-csi-addons-operator.v4.12.0-152.stable   CSI Addons                    4.12.0-152.stable                Succeeded
odf-operator.v4.12.0-152.stable              OpenShift Data Foundation     4.12.0-152.stable                Succeeded

[root@api.sps1.cp.fyre.ibm.com backup-folder]# oc get subscription -n openshift-storage
NAME                                                                          PACKAGE                   SOURCE              CHANNEL
mcg-operator-stable-4.12-ocs-catalogsource-openshift-marketplace              mcg-operator              ocs-catalogsource   stable-4.12
ocs-operator-stable-4.12-ocs-catalogsource-openshift-marketplace              ocs-operator              ocs-catalogsource   stable-4.12
odf-csi-addons-operator-stable-4.12-ocs-catalogsource-openshift-marketplace   odf-csi-addons-operator   ocs-catalogsource   stable-4.12
odf-operator                                                                  odf-operator              ocs-catalogsource   stable-4.12

OCP Version:

[root@api.sps1.cp.fyre.ibm.com backup-folder]# oc version
Client Version: 4.11.9
Kustomize Version: v4.5.4
Server Version: 4.12.0-rc.6
Kubernetes Version: v1.25.4+77bec7a

Images :

- name: ROOK_CEPH_IMAGE
          value: quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:60f1ae2a2a28802fceca9a75252cec045755ca2fc0679f9693b185188561d86e
        - name: CEPH_IMAGE
          value: quay.io/rhceph-dev/rhceph@sha256:c6fe7e71ad1b13281d1d2399ceb98d3d6927df40e5d442a15fa0dee2976ccbcf
        - name: NOOBAA_CORE_IMAGE
          value: quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:b495b59219d78ab468d1e1faedacfda59cb4b9fe13b253157897ff6899811de5
        - name: NOOBAA_DB_IMAGE
          value: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:f4d8f5f165da493568802b4115f5e68af7cc11a3f14769e495de4a3f61a58238
        - name: PROVIDER_API_SERVER_IMAGE
          value: quay.io/rhceph-dev/odf4-ocs-rhel8-operator@sha256:c4e3463ccb0cf38f7feb71b1cfcd55de006e598d4b8fa3c9eb9175c8083fe0ce
        - name: OPERATOR_CONDITION_NAME
          value: ocs-operator.v4.12.0-152.stable
        image: quay.io/rhceph-dev/odf4-ocs-rhel8-operator@sha256:c4e3463ccb0cf38f7feb71b1cfcd55de006e598d4b8fa3c9eb9175c8083fe0ce
        imagePullPolicy: Always
nimrod-becker commented 1 year ago

Something differs in the way this deployment is done since we don't see it at all during 4.12 runs

rkomandu commented 1 year ago

@nimrod-becker , as the downstream is been used to test this from Quay.io (the procedure is similar to what was followed earlier). Any other suggestion for this ? Are the Postgres image same across 4.12 that was tried in your env and in our env ?

nimrod-becker commented 1 year ago

If it's the same build it's the same images. In any case, a suggestion was already made 2 weeks ago, and it should solve the issue, its also not a big change Liran's Comment

rkomandu commented 1 year ago

If it's the same build it's the same images. In any case, a suggestion was already made 2 weeks ago, and it should solve the issue, its also not a big change Liran's Comment

@nimrod-becker , we will change this in DAS code, however what you meant by Deployment is different ? I am not sure what/how it was tried in your ODF env

rkomandu commented 1 year ago

If it's the same build it's the same images. In any case, a suggestion was already made 2 weeks ago, and it should solve the issue, its also not a big change Liran's Comment

@nimrod-becker , we will change this in DAS code, however what you meant by Deployment is different ? I am not sure what/how it was tried in your ODF env

rkomandu commented 1 year ago

@nimrod-becker , update: This was tried on the ODF 4.12 GA code level with the below postgres image

oc get csv -n openshift-storage -o yaml |grep -i full full_version: 4.12.0-173 full_version: 4.12.0-173 full_version: 4.12.0-173

Image: registry.redhat.io/rhel8/postgresql-12@sha256:3d805540d777b09b4da6df99e7cddf9598d5ece4af9f6851721a9961df40f5a1

We need to change our scripts to ensure that back-up is created. So with Liran proposed change we will deal with it in our upcoming release.

github-actions[bot] commented 4 months ago

This issue had no activity for too long - it will now be labeled stale. Update it to prevent it from getting closed.

github-actions[bot] commented 3 months ago

This issue is stale and had no activity for too long - it will now be closed.