Closed scymz1 closed 3 months ago
I have solved problem by myself, I think I delete pvc and change values.yaml to persistent solved this issues, but not sure the reason for it.
delete pvc
kubectl delete pvc --all
change values.yaml to this?
global:
dev: true
hostname: localhost
# configuration for fence helm chart. You can add it for all our services.
fence:
# Fence config overrides
FENCE_CONFIG:
OPENID_CONNECT:
google:
client_id: ""
client_secret: ""
AWS_CREDENTIALS:
'fence-bot':
aws_access_key_id: ''
aws_secret_access_key: ''
S3_BUCKETS:
# Name of the actual s3 bucket
jq-helm-testing:
cred: 'fence-bot'
region: us-east-1
# This is important for data upload.
DATA_UPLOAD_BUCKET: 'jq-helm-testing'
portal:
image:
repository: quay.io/cdis/data-portal-prebuilt
tag: brh.data-commons.org-feat-pr_comment
resources:
requests:
cpu: 0.2
memory: 500Mi
postgresql:
primary:
persistence:
enabled: true
Hello @scymz1
Thanks for reporting this issue.
I'll try to take a look into this today.
If deleting the PVC's helped, the issue might have been that your postgresql pod had gen3 users created from a previous deployment, and the new deployment will try to generate new credentials and you run into a mismatch of passwords. You should see that in the logs of the failing pods.
Hi @jawadqur Wired things happens that for deployment when I uninstalled using helm and try to reinstall it, the problem occurs again and I cannot solve it. I delete pvc before reinstall but still not working.
My current values.yaml file:
global:
dev: true
hostname: localhost
# configuration for fence helm chart. You can add it for all our services.
fence:
# Fence config overrides
FENCE_CONFIG:
OPENID_CONNECT:
google:
client_id: ""
client_secret: ""
AWS_CREDENTIALS:
'fence-bot':
aws_access_key_id: ''
aws_secret_access_key: ''
S3_BUCKETS:
# Name of the actual s3 bucket
jq-helm-testing:
cred: 'fence-bot'
region: us-east-1
# This is important for data upload.
DATA_UPLOAD_BUCKET: 'jq-helm-testing'
portal:
image:
repository: quay.io/cdis/data-portal-prebuilt
tag: brh.data-commons.org-feat-pr_comment
resources:
requests:
cpu: 0.2
memory: 500Mi
arborist:
postgres:
dbCreate: true
username: gen3_arborist
password: gen3_arborist
Here is my logs:
get pods
NAME READY STATUS RESTARTS AGE
portal-deployment-c9d6d9776-lb78f 0/1 Pending 0 20m
sower-85597bddbf-fdsm6 0/1 Pending 0 20m
fence-deployment-6f8489fb88-lxgt4 0/1 Pending 0 20m
hatchery-deployment-7fcb68fb65-zcg76 0/1 Pending 0 20m
revproxy-deployment-9764957cd-djnh7 1/1 Running 0 20m
gen3-postgresql-0 1/1 Running 0 20m
wts-oidc-job-qhwbd 0/2 Init:0/1 0 20m
manifestservice-deployment-6c74479448-jmznd 1/1 Running 0 20m
peregrine-dbcreate-rswdw 0/1 Completed 0 20m
arborist-dbcreate-78jvw 0/1 Completed 0 20m
metadata-dbcreate-zc8b7 0/1 Completed 0 20m
sheepdog-dbcreate-p55hq 0/1 Completed 0 20m
wts-dbcreate-2kvvs 0/1 Completed 0 20m
audit-dbcreate-g7l6q 0/1 Completed 0 20m
fence-dbcreate-rmbrw 0/1 Completed 0 20m
indexd-dbcreate-mfjmz 0/1 Completed 0 20m
gen3-elasticsearch-master-0 1/1 Running 0 20m
arborist-deployment-77645d555-pzl8v 1/1 Running 0 20m
ambassador-deployment-6cd65d48d6-xm8db 1/1 Running 0 20m
wts-deployment-57ff756898-55d4b 0/1 CreateContainerConfigError 0 20m
indexd-deployment-b845d4565-cjcr9 1/1 Running 0 20m
metadata-deployment-559bbdd459-gm88s 1/1 Running 0 20m
audit-deployment-54c96847c8-bv8g4 1/1 Running 0 20m
indexd-userdb-jtxx9 0/1 Completed 0 20m
presigned-url-fence-deployment-6d657f9cfd-bb45d 1/1 Running 0 20m
argo-wrapper-deployment-85f5d4b756-ft7cb 0/1 ImagePullBackOff 0 20m
useryaml-qm2qn 0/1 CrashLoopBackOff 7 (2m37s ago) 20m
sheepdog-deployment-746959d756-fk2pz 0/1 CrashLoopBackOff 8 (114s ago) 20m
pidgin-deployment-b9c7c5b7d-rt86d 0/1 Running 5 (82s ago) 20m
peregrine-deployment-6d9b6b584b-jnsc6 0/1 Running 5 (22s ago) 20m
For pod useryaml-qm2qn :
[notice] A new release of pip is available: 23.3.1 -> 24.0 │
│ [notice] To update, run: pip install --upgrade pip │
│ [2024-04-02 15:22:25,537][gen3config.config][ INFO] Opening default configuration... │
│ [2024-04-02 15:22:26,196][gen3config.config][ INFO] Applying configuration: /var/www/fence/fence-config.yaml │
│ [2024-04-02 15:22:26,606][gen3config.config][WARNING] Did not provide key(s) dict_keys(['DB', 'DB_MIGRATION_POSTGRES_LOCK_KEY', 'SESSION_COOKIE_DOMAIN', 'GA4GH_DRS_POSTED_PASSPORT_FIELD', 'PRIVACY_POLICY_URL', │
│ [2024-04-02 15:22:26,978][fence.config][ INFO] Found environment variable 'DB': overriding 'DB' field from config file │
│ [2024-04-02 15:22:26,978][fence.config][ INFO] Found environment variable 'INDEXD_PASSWORD': overriding 'INDEXD_PASSWORD' field from config file │
│ WARNING:flask_cors.core:Unknown option passed to Flask-CORS: headers │
│ WARNING:flask_cors.core:Unknown option passed to Flask-CORS: headers │
│ [2024-04-02 15:22:31,637][gen3config.config][ INFO] Opening default configuration... │
│ [2024-04-02 15:22:32,292][gen3config.config][ INFO] Applying configuration: /var/www/fence/fence-config.yaml │
│ [2024-04-02 15:22:32,705][gen3config.config][WARNING] Did not provide key(s) dict_keys(['DB', 'DB_MIGRATION_POSTGRES_LOCK_KEY', 'SESSION_COOKIE_DOMAIN', 'GA4GH_DRS_POSTED_PASSPORT_FIELD', 'PRIVACY_POLICY_URL', │
│ [2024-04-02 15:22:33,077][fence.config][ INFO] Found environment variable 'DB': overriding 'DB' field from config file │
│ [2024-04-02 15:22:33,077][fence.config][ INFO] Found environment variable 'INDEXD_PASSWORD': overriding 'INDEXD_PASSWORD' field from config file │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'anonymous_policies'. Fix sugge │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'anonymous_poli │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'full_open_access' has 'service = *'. This is unsecure because policy 'full_open_access' is granted to public group 'anonymous_policies'. Fix sugge │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'full_open_access' has 'service = *'. This is unsecure because policy 'full_open_access' is granted to public group 'anonymous_poli │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'all_users_policies'. Fix sugge │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'all_users_poli │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'authn_open_access' has 'service = *'. This is unsecure because policy 'authn_open_access' is granted to public group 'all_users_policies'. Fix sug │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'authn_open_access' has 'service = *'. This is unsecure because policy 'authn_open_access' is granted to public group 'all_users_po │
│ [2024-04-02 15:22:33,610][user_syncer][ ERROR] user.yaml validation failed. See errors in previous logs. │
│ [2024-04-02 15:22:33,610][user_syncer][ ERROR] aborting early │
│ Traceback (most recent call last): │
│ File "/usr/local/bin/fence-create", line 6, in <module> │
│ sys.exit(main()) File "/fence/bin/fence_create.py", line 502, in main │
│ sync_users( │
│ File "/fence/fence/scripting/fence_create.py", line 510, in sync_users │
│ syncer.sync() │
│ File "/fence/fence/sync/sync_users.py", line 1515, in sync │
│ self._sync(s) │
│ File "/fence/fence/sync/sync_users.py", line 1577, in _sync │
│ user_yaml = UserYAML.from_file( │
│ File "/fence/fence/sync/sync_users.py", line 172, in from_file │
│ validate_user_yaml(file_contents) # run user.yaml validation tests │
│ File "/usr/local/lib/python3.9/site-packages/gen3users/validation.py", line 58, in validate_user_yaml │
│ raise AssertionError( │
│ AssertionError: user.yaml validation failed. See errors in previous logs.
For pod gen3-postgresql-0:
postgresql 15:19:06.43 INFO ==> ** Starting PostgreSQL ** │
│ 2024-04-02 15:19:06.491 GMT [1] LOG: pgaudit extension initialized │
│ 2024-04-02 15:19:06.602 GMT [1] LOG: starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit │
│ 2024-04-02 15:19:06.608 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 │
│ 2024-04-02 15:19:06.608 GMT [1] LOG: listening on IPv6 address "::", port 5432 │
│ 2024-04-02 15:19:06.611 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" │
│ 2024-04-02 15:19:06.623 GMT [367] LOG: database system was shut down at 2024-04-02 15:19:06 GMT │
│ 2024-04-02 15:19:06.635 GMT [1] LOG: database system is ready to accept connections │
│ 2024-04-02 15:19:21.054 GMT [397] FATAL: password authentication failed for user "peregrine_gen3" │
│ 2024-04-02 15:19:21.054 GMT [397] DETAIL: Role "peregrine_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:21.703 GMT [405] FATAL: password authentication failed for user "sheepdog_gen3" │
│ 2024-04-02 15:19:21.703 GMT [405] DETAIL: Role "sheepdog_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:22.081 GMT [407] FATAL: password authentication failed for user "metadata_gen3" │
│ 2024-04-02 15:19:22.081 GMT [407] DETAIL: Role "metadata_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:22.654 GMT [413] FATAL: password authentication failed for user "gen3_arborist" │
│ 2024-04-02 15:19:22.654 GMT [413] DETAIL: Role "gen3_arborist" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:22.969 GMT [417] FATAL: password authentication failed for user "wts_gen3" │
│ 2024-04-02 15:19:22.969 GMT [417] DETAIL: Role "wts_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:32.283 GMT [501] FATAL: password authentication failed for user "audit_gen3" │
│ 2024-04-02 15:19:32.283 GMT [501] DETAIL: Role "audit_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:32.327 GMT [503] FATAL: password authentication failed for user "indexd_gen3" │
│ 2024-04-02 15:19:32.327 GMT [503] DETAIL: Role "indexd_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:32.618 GMT [505] FATAL: password authentication failed for user "fence_gen3" │
│ 2024-04-02 15:19:32.618 GMT [505] DETAIL: Role "fence_gen3" does not exist. │
│ Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5" │
│ 2024-04-02 15:19:41.918 GMT [575] ERROR: relation "db_version" does not exist at character 21 │
│ 2024-04-02 15:19:41.918 GMT [575] STATEMENT: select version from db_version
for peregrine-deployment-6d9b6b584b-jnsc6
*** Python threads support is disabled. You can enable it with --enable-threads *** │
│ Python main interpreter initialized at 0x4000172420 │
│ your server socket listen backlog is limited to 100 connections │
│ your mercy for graceful operations on workers is 45 seconds │
│ mapped 304776 bytes (297 KB) for 2 cores │
│ *** Operational MODE: preforking *** │
│ added /var/www/peregrine/ to pythonpath. │
│ added /peregrine/ to pythonpath. │
│ added /usr/local/lib/python3.6/site-packages/ to pythonpath. │
│ failed to open python file /var/www/peregrine/wsgi.py │
│ unable to load app 0 (mountpoint='') (callable not found or import error) │
│ *** no app loaded. going in full dynamic mode *** │
│ *** uWSGI is running in multiple interpreter mode *** │
│ spawned uWSGI master process (pid: 2013) │
│ spawned uWSGI worker 1 (pid: 2019, cores: 1) │
│ spawned uWSGI worker 2 (pid: 2021, cores: 1) │
│ --- no python application found, check your startup logs for errors --- │
│ --- no python application found, check your startup logs for errors --- │
│ --- no python application found, check your startup logs for errors --- │
│ --- no python application found, check your startup logs for errors ---
For wts-deployment-57ff756898-55d4b:
stream logs failed container "wts" in pod "wts-deployment-57ff756898-55d4b" is waiting to start: CreateContainerConfigError for default/wts-deployment-57ff756898-55d4b (wts)
https://github.com/uc-cdis/gen3-helm/pull/164
I merged a fix for sheepdog / peregrine issues.
For the useryaml job you need to specify a valid user yaml in your fence configuration.
I am new to gen3 and followed instruction to deploy the most basic one using helm. I installed ranch desktop and postgreSQL@13, but stll cannot reach website at https://localhost/. Here is my values.yaml:
And when
I run helm upgrade --install gen3 gen3/gen3 -f ./values.yaml
and check the status of pod, it seem like some pod is not working or pend:And here is some logs for crashed pods:
Any suggestions on this?
And when I reinstall gen3 and get logs from "default/portal-deployment-6c7f86d4f8-brqjn" like this: