Closed dejwsz closed 4 years ago
ups OK, sorry I will take a look there, thx
I tried the thing with "pod_environment_configmap" and setting PATRONI_KUBERNETES_USE_ENDPOINTS to false. No luck with this.
Read the next comment down from that and you'll see some further information like:
correct env name for spilo is currently KUBERNETES_USE_CONFIGMAPS
Indeed. But spilo-role is not assigned so "pod_role_label" is not working. Any trick here?
I saw such warning in log:
2020-03-03 10:18:45,606 - bootstrapping - WARNING - could not parse kubernetes labels as a JSON: Expecting value: line 1 column 1 (char 0), reverting to the default: {"application": "spilo"}
and two CRIT
2020-03-03 10:18:46,711 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2020-03-03 10:18:46,721 CRIT Server 'unix_http_server' running without any HTTP authentication checking
but in general, it looks like it's starting ok but it does not label pods as it should - "spilo-role" is not assigned at all.
I can see KUBERNETES_ROLE_LABEL='spilo-role' is set in cluster PODs environments but the role never gets assigned to master or replica PODs. Some kind of bug?
Is there any way to enforce DEBUG level for Spilo out there? I tried to set "debug_logging: true" in operator config but it didn't help. Having debug mode for Spilo I could see more detailed messages to see if "Changing the pod's role to" message appears because then the spilo role label should be set also but it does not happen.
OK I see - it is enough to add DEBUG=true to config map pointed by "pod_environment_configmap"
I can see this while bootstrapping:
2020-03-03 15:05:09,044 - bootstrapping - DEBUG - b"Can't load /root/.rnd into RNG\n139974210200000:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:88:Filename=/root/.rnd\nGenerating a RSA private key\n.+++++\n..................................................................................................................................................................................................+++++\nwriting new private key to '/home/postgres/server.key'\n-----\n"
but later there is no message "Changing the pod's role to". So it means it never tries to assign master or replica role at all. So it does not work well even in the privileged mode under Openshift.
So one POD shows this all the time later: 2020-03-03 15:07:26,126 INFO: waiting for leader to bootstrap 2020-03-03 15:07:36,125 INFO: Lock owner: None; I am test-minimal-cluster-1
and the second this: 2020-03-03 10:19:58,259 INFO: waiting for leader to bootstrap 2020-03-03 10:20:08,259 INFO: Lock owner: None; I am test-minimal-cluster-0
Any idea how to run it?
Still no luck but got new error:
| File "/scripts/callback_endpoint.py", line 9, in
And what is interesting after reinstalling everything and using combination of operator version and spilo image: registry.opensource.zalan.do/acid/postgres-operator:v1.3.1 registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p16 I have now properly assigned labels (so spilo-role=master and spilo-role=replica are in place).
Interesting - I saw master service was broken - no selectors. So I remove the cluster and later created it once again. I fixed the master service by adding missing selectors. And surprise - this time labels were no assigned to PODs again. So didn't work because of this.
After another try - so cleanup and adding the test cluster again - I had all labels in place and master service fixed and finally cluster in Running state: NAME TEAM VERSION PODS VOLUME CPU-REQUEST MEMORY-REQUEST AGE STATUS test-minimal-cluster TEST 11 2 1Gi 8m Running So it works for me in privileged mode on Openshift 3.11, ufff. I will try now to do it in restricted mode.
My steps to run simple postgres cluster in Openshift 3.11 in privileged mode. I installed OLM "0.14.1" and postgres-operator in version "1.3.0" (image was replaced later to version "1.3.1").
Create "postgresql-operator-default-configuration" in olm namespace.
apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
name: postgresql-operator-default-configuration
configuration:
docker_image: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p16
max_instances: 3
min_instances: 1
resync_period: 30m
repair_period: 5m
workers: 4
users:
replication_username: standby
super_username: postgres
kubernetes:
cluster_domain: cluster.local
cluster_labels:
application: spilo
cluster_name_label: cluster-name
cluster_history_entries: "1000"
enable_init_containers: true
enable_pod_antiaffinity: true
enable_pod_disruption_budget: false
enable_sidecars: true
enable_shm_volume: true
inherited_labels:
- application
- environment
pdb_name_format: "postgres-{cluster}-pdb"
pod_antiaffinity_topology_key: "failure-domain.beta.kubernetes.io/zone"
pod_management_policy: ordered_ready
pod_role_label: spilo-role
pod_terminate_grace_period: 5m
secret_name_template: "{username}.{cluster}.credentials.{tprkind}.{tprgroup}"
toleration: {}
spilo_privileged: true
watched_namespace: "olm"
pod_environment_configmap: "pod-env-cfg"
postgres_pod_resources:
default_cpu_limit: "2"
default_cpu_request: "250m"
default_memory_limit: "2Gi"
default_memory_request: "250Mi"
timeouts:
pod_label_wait_timeout: 10m
pod_deletion_wait_timeout: 10m
ready_wait_interval: 5s
ready_wait_timeout: 30s
resource_check_interval: 5s
resource_check_timeout: 10m
load_balancer:
enable_master_load_balancer: false
enable_replica_load_balancer: false
master_dns_name_format: "{cluster}.{team}.{hostedzone}"
replica_dns_name_format: "{cluster}-repl.{team}.{hostedzone}"
aws_or_gcp:
aws_region: my-region
logical_backup:
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
logical_backup_s3_access_key_id: "my-accees-key"
logical_backup_s3_bucket: "spilo-backup"
logical_backup_s3_endpoint: "my-enpoint"
logical_backup_s3_secret_access_key: "my-secret"
logical_backup_s3_sse: "AES256"
logical_backup_schedule: "*/5 * * * *"
debug:
debug_logging: true
enable_database_access: true
teams_api:
enable_team_superuser: false
enable_teams_api: false
pam_role_name: teamapipostgres
protected_role_names:
- admin
team_admin_role: admin
team_api_role_configuration:
log_statement: all
logging_rest_api:
api_port: 8008
cluster_history_entries: 1000
ring_log_lines: 100
Create config map "pod-env-cfg" in olm with:
DEBUG: "true"
KUBERNETES_USE_CONFIGMAPS: "true"
PATRONI_KUBERNETES_ROLE_LABEL: spilo-role
Add privileged to account:
oc adm policy add-scc-to-user privileged -n olm -z operator
Edit role postgres-operator.v1.3.0-XXXX in olm: and change configmaps perms:
- apiGroups:
- ''
resources:
- configmaps
verbs:
- get
- list
- create
- patch
- update
- watch
I also added 'update' to endpoints and pods.
Edit postgres-operator CSV and change/add this:
spec:
containers:
- env:
- name: POSTGRES_OPERATOR_CONFIGURATION_OBJECT
value: postgresql-operator-default-configuration
image: 'registry.opensource.zalan.do/acid/postgres-operator:v1.3.1'
Create a new postgres cluster:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: test-minimal-cluster
spec:
teamId: "TEST"
volume:
size: 1Gi
numberOfInstances: 2
users:
# database owner
appadmin:
- superuser
- createdb
# role for application foo
appuser: []
#databases: name->owner
databases:
appdb: appuser
postgresql:
version: "11"
(Optional step - only if needed!) Right after it, fix master service adding selectors like:
selector:
application: spilo
cluster-name: test-minimal-cluster
spilo-role: master
Wait until is Running (spilo-role label must be assigned to both PODs).
I tried the same with latest version of operator 1.4.0 and Spilo image: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p2 and it does not work well. Labels are not assigned.
Operator in version 1.3.1 and Spilo image: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p2 works fine in privileged mode too.
@dejwsz -> can you confirm that in operator 1.3.1 the master service gets the "selector" part populated, and it's only 1.4.0 with this issue?
I switched to other things and different operator (Crunchy), I do not know if I will find time for this soon. If yes I will give any feedback.
For running in OpenShift, (including non-root mode): Operator Image should be at least: registry.opensource.zalan.do/acid/postgres-operator:v1.4.0-21-g1249626-dirty Operator should be configured with these values: kubernetes_use_configmaps: "true" docker_image: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p114 #or newer
To get the latest versions of the images of both operator and spilo, do: https://registry.opensource.zalan.do/v2/acid/postgres-operator/tags/list https://registry.opensource.zalan.do/v2/acid/spilo-cdp-12/tags/list
Thanks @ReSearchITEng for providing infos about rootless Spilo and new operator options. Closed it now.
I ran the operator in privileged mode just to check if it works - version "1.3.1". I used the same serviceaccount for PODs originally created by CSV. I added privileged scc to the account just to be sure all permissions are there. And finally, my test cluster was created but such error was shown in logs and cluster ended up with SyncFailed status.
... 2020-03-02 14:23:44,915 INFO: Lock owner: test-minimal-cluster-0; I am test-minimal-cluster-0
| 2020-03-02 14:23:44,915 INFO: establishing a new patroni connection to the postgres cluster | 2020-03-02 14:23:44,927 ERROR: Permission denied | Traceback (most recent call last): | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 61, in wrapper | return func(*args, kwargs) | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 282, in patch_or_create | return self.retry(func, self._namespace, body) if retry else func(self._namespace, body) | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 114, in retry | return self._retry.copy()(*args, *kwargs) | File "/usr/local/lib/python3.6/dist-packages/patroni/utils.py", line 313, in call | return func(args, kwargs) | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 50, in wrapper | return getattr(self._api, func)(args, kwargs) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 15602, in patch_namespaced_endpoints | (data) = self.patch_namespaced_endpoints_with_http_info(name, namespace, body, kwargs) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 15698, in patch_namespaced_endpoints_with_http_info | collection_formats=collection_formats) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 335, in call_api | _preload_content, _request_timeout) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 148, in __call_api | _request_timeout=_request_timeout) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 409, in request | body=body) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 307, in PATCH | body=body) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 240, in request | raise ApiException(http_resp=r) | kubernetes.client.rest.ApiException: (403) | Reason: Forbidden | HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'Date': 'Mon, 02 Mar 2020 14:23:44 GMT', 'Content-Length': '267'}) | HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"test-minimal-cluster\" is forbidden: endpoint address 10.128.3.34 is not allowed","reason":"Forbidden","details":{"name":"test-minimal-cluster","kind":"endpoints"},"code":403} | | | 2020-03-02 14:23:44,927 ERROR: failed to update leader lock | 2020-03-02 14:23:44,962 INFO: not promoting because failed to update leader lock in DCS | 2020-03-02 14:23:54,917 INFO: Lock owner: test-minimal-cluster-0; I am test-minimal-cluster-0 | 2020-03-02 14:23:54,921 ERROR: Permission denied | Traceback (most recent call last): | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 61, in wrapper | return func(args, kwargs) | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 282, in patch_or_create | return self.retry(func, self._namespace, body) if retry else func(self._namespace, body) | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 114, in retry | return self._retry.copy()(*args, *kwargs) | File "/usr/local/lib/python3.6/dist-packages/patroni/utils.py", line 313, in call | return func(args, kwargs) | File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 50, in wrapper | return getattr(self._api, func)(*args, kwargs) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 15602, in patch_namespaced_endpoints | (data) = self.patch_namespaced_endpoints_with_http_info(name, namespace, body, kwargs) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 15698, in patch_namespaced_endpoints_with_http_info | collection_formats=collection_formats) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 335, in call_api | _preload_content, _request_timeout) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 148, in __call_api | _request_timeout=_request_timeout) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 409, in request | body=body) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 307, in PATCH | body=body) | File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 240, in request | raise ApiException(http_resp=r) | kubernetes.client.rest.ApiException: (403) | Reason: Forbidden | HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'Date': 'Mon, 02 Mar 2020 14:23:54 GMT', 'Content-Length': '267'}) | HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"test-minimal-cluster\" is forbidden: endpoint address 10.128.3.34 is not allowed","reason":"Forbidden","details":{"name":"test-minimal-cluster","kind":"endpoints"},"code":403} ....