uc-cdis / gen3-helm

Helm charts for Gen3 Deployments
Apache License 2.0
9 stars 23 forks source link

Follow basic instruction from readme but still not able to deploy #163

Closed scymz1 closed 3 months ago

scymz1 commented 6 months ago

I am new to gen3 and followed instruction to deploy the most basic one using helm. I installed ranch desktop and postgreSQL@13, but stll cannot reach website at https://localhost/. Here is my values.yaml:

global:
  hostname: localhost
  # hostname: example-commons.com

# fence: 
  FENCE_CONFIG:
    # Any fence-config overrides here. ]

arborist:
  postgres:
    dbCreate: true
    username: gen3_arborist
    password: gen3_arborist

And when I run helm upgrade --install gen3 gen3/gen3 -f ./values.yaml and check the status of pod, it seem like some pod is not working or pend:

kubectl get pod
NAME                                              READY   STATUS             RESTARTS          AGE
wts-deployment-57ff756898-hc6tz                   0/1     Pending            0                 2d17h
fence-deployment-6f8489fb88-v4xlp                 0/1     Pending            0                 2d17h
portal-deployment-6c7f86d4f8-dkvbn                0/1     Pending            0                 2d17h
hatchery-deployment-7fcb68fb65-phpf7              1/1     Running            0                 2d17h
wts-oidc-job-52gzp                                0/2     Init:0/1           0                 2d17h
revproxy-deployment-9764957cd-cl5n5               1/1     Running            0                 2d17h
gen3-postgresql-0                                 1/1     Running            0                 2d17h
sower-85597bddbf-j5s86                            1/1     Running            0                 2d17h
manifestservice-deployment-6c74479448-tnrbs       1/1     Running            0                 2d17h
indexd-dbcreate-d86wl                             0/1     Completed          0                 2d17h
metadata-dbcreate-qvb88                           0/1     Completed          0                 2d17h
audit-dbcreate-k6gkf                              0/1     Completed          0                 2d17h
peregrine-dbcreate-rj5xf                          0/1     Completed          0                 2d17h
wts-dbcreate-twg8k                                0/1     Completed          0                 2d17h
arborist-dbcreate-ghqzv                           0/1     Completed          0                 2d17h
sheepdog-dbcreate-m6v8g                           0/1     Completed          0                 2d17h
fence-dbcreate-95kgz                              0/1     Completed          0                 2d17h
gen3-elasticsearch-master-0                       1/1     Running            0                 2d17h
arborist-deployment-77645d555-ch4tv               1/1     Running            0                 2d17h
indexd-deployment-b845d4565-flk9p                 1/1     Running            0                 2d17h
metadata-deployment-559bbdd459-86ptb              1/1     Running            0                 2d17h
audit-deployment-54c96847c8-696x8                 1/1     Running            0                 2d17h
indexd-userdb-4lnc7                               0/1     Completed          0                 2d17h
presigned-url-fence-deployment-6d657f9cfd-kbrsf   1/1     Running            0                 2d17h
ambassador-deployment-6cd65d48d6-dljsz            0/1     Running            52 (26h ago)      2d17h
sheepdog-deployment-746959d756-l27vz              0/1     CrashLoopBackOff   337 (3m32s ago)   2d17h
argo-wrapper-deployment-85f5d4b756-mv2l5          0/1     ImagePullBackOff   0                 2d17h
peregrine-deployment-6d9b6b584b-9dgnn             0/1     CrashLoopBackOff   222 (22s ago)     2d17h
pidgin-deployment-b9c7c5b7d-rdgzs                 0/1     CrashLoopBackOff   219 (21s ago)     2d17h

And here is some logs for crashed pods:

(base) FDSIT-7000-M7:config minghao.zhou$ kubectl logs pidgin-deployment-b9c7c5b7d-rdgzs
Got configuration:
GEN3_DEBUG=False
GEN3_UWSGI_TIMEOUT=45s
GEN3_DRYRUN=False
Running update-ca-certificates
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Running mkdir -p /var/run/gen3
Running nginx -g daemon off;
Running uwsgi --ini /etc/uwsgi/uwsgi.ini
[uWSGI] getting INI configuration from /app/uwsgi.ini
[uWSGI] getting INI configuration from /etc/uwsgi/uwsgi.ini
open("./python3_plugin.so"): No such file or directory [core/utils.c line 3732]
!!! UNABLE to load uWSGI plugin: ./python3_plugin.so: cannot open shared object file: No such file or directory !!!
*** Starting uWSGI 2.0.20 (64bit) on [Mon Apr  1 16:29:58 2024] ***
compiled with version: 8.3.0 on 14 March 2022 16:22:45
os: Linux-6.6.14-0-virt #1-Alpine SMP Fri, 26 Jan 2024 11:08:07 +0000
nodename: pidgin-deployment-b9c7c5b7d-rdgzs
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /var/www/pidgin
detected binary path: /usr/local/bin/uwsgi
chdir() to /pidgin/
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
!!! it looks like your kernel does not support pthread robust mutexes !!!
!!! falling back to standard pthread mutexes !!!
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /var/run/gen3/uwsgi.sock fd 3
setgid() to 102
set additional group 101 (ssh)
setuid() to 102
Python version: 3.9.10 (main, Mar  2 2022, 04:40:14)  [GCC 8.3.0]
2024/04/01 16:29:58 [notice] 2011#2011: using the "epoll" event method
2024/04/01 16:29:58 [notice] 2011#2011: nginx/1.21.1
2024/04/01 16:29:58 [notice] 2011#2011: built by gcc 8.3.0 (Debian 8.3.0-6) 
2024/04/01 16:29:58 [notice] 2011#2011: OS: Linux 6.6.14-0-virt
2024/04/01 16:29:58 [notice] 2011#2011: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2024/04/01 16:29:58 [notice] 2011#2011: start worker processes
2024/04/01 16:29:58 [notice] 2011#2011: start worker process 2017
2024/04/01 16:29:58 [emerg] 2017#2017: io_setup() failed (38: Function not implemented)
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x4000171d60
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 304776 bytes (297 KB) for 2 cores
*** Operational MODE: preforking ***
added /usr/local/lib/python3.9/site-packages/ to pythonpath.
WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x4000171d60 pid: 2013 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 2013)
spawned uWSGI worker 1 (pid: 2019, cores: 1)
spawned uWSGI worker 2 (pid: 2021, cores: 1)
[2024-04-01 16:30:08,527][pidgin.app][  ERROR] Peregrine not available; returning unhealthy
[2024-04-01 16:30:18,526][pidgin.app][  ERROR] Peregrine not available; returning unhealthy
(base) FDSIT-7000-M7:config minghao.zhou$ kubectl logs peregrine-deployment-6d9b6b584b-9dgnn
Got configuration:
GEN3_DEBUG=False
GEN3_UWSGI_TIMEOUT=600
GEN3_DRYRUN=False
Running update-ca-certificates
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Running mkdir -p /var/run/gen3
Running nginx -g daemon off;
Running uwsgi --ini /etc/uwsgi/uwsgi.ini
[uWSGI] getting INI configuration from /app/uwsgi.ini
[uWSGI] getting INI configuration from /etc/uwsgi/uwsgi.ini
open("./python3_plugin.so"): No such file or directory [core/utils.c line 3732]
!!! UNABLE to load uWSGI plugin: ./python3_plugin.so: cannot open shared object file: No such file or directory !!!
*** Starting uWSGI 2.0.20 (64bit) on [Mon Apr  1 16:29:47 2024] ***
compiled with version: 8.3.0 on 11 November 2021 18:25:57
os: Linux-6.6.14-0-virt #1-Alpine SMP Fri, 26 Jan 2024 11:08:07 +0000
nodename: peregrine-deployment-6d9b6b584b-9dgnn
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /var/www/peregrine
detected binary path: /usr/local/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
!!! it looks like your kernel does not support pthread robust mutexes !!!
!!! falling back to standard pthread mutexes !!!
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /var/run/gen3/uwsgi.sock fd 3
setgid() to 102
set additional group 101 (ssh)
setuid() to 102
Python version: 3.6.15 (default, Oct 13 2021, 09:49:57)  [GCC 8.3.0]
2024/04/01 16:29:47 [notice] 2011#2011: using the "epoll" event method
2024/04/01 16:29:47 [notice] 2011#2011: nginx/1.21.1
2024/04/01 16:29:47 [notice] 2011#2011: built by gcc 8.3.0 (Debian 8.3.0-6) 
2024/04/01 16:29:47 [notice] 2011#2011: OS: Linux 6.6.14-0-virt
2024/04/01 16:29:47 [notice] 2011#2011: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2024/04/01 16:29:47 [notice] 2011#2011: start worker processes
2024/04/01 16:29:47 [notice] 2011#2011: start worker process 2017
2024/04/01 16:29:47 [emerg] 2017#2017: io_setup() failed (38: Function not implemented)
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x4000172420
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 45 seconds
mapped 304776 bytes (297 KB) for 2 cores
*** Operational MODE: preforking ***
added /var/www/peregrine/ to pythonpath.
added /peregrine/ to pythonpath.
added /usr/local/lib/python3.6/site-packages/ to pythonpath.
failed to open python file /var/www/peregrine/wsgi.py
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. going in full dynamic mode ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 2013)
spawned uWSGI worker 1 (pid: 2019, cores: 1)
spawned uWSGI worker 2 (pid: 2021, cores: 1)
--- no python application found, check your startup logs for errors ---
--- no python application found, check your startup logs for errors ---
--- no python application found, check your startup logs for errors ---
(base) FDSIT-7000-M7:config minghao.zhou$ kubectl logs argo-wrapper-deployment-85f5d4b756-mv2l5
Error from server (BadRequest): container "argo-wrapper" in pod "argo-wrapper-deployment-85f5d4b756-mv2l5" is waiting to start: trying and failing to pull image
(base) FDSIT-7000-M7:config minghao.zhou$ kubectl logs sheepdog-deployment-746959d756-l27vz 
Defaulted container "sheepdog" out of: sheepdog, sheepdog-init (init)

Any suggestions on this?


And when I reinstall gen3 and get logs from "default/portal-deployment-6c7f86d4f8-brqjn" like this:

                                                                                                                                                                                                                   │
│ > cloud_portal@0.1.0 schema                                                                                                                                                                                        │
│ > node ./data/getSchema                                                                                                                                                                                            │
│                                                                                                                                                                                                                    │
│ Fetching http://revproxy-service/api/v0/submission/getschema                                                                                                                                                       │
│ Fetching http://revproxy-service/api/v0/submission/_dictionary/_all                                                                                                                                                │
│ failed fetch - non-200 from server: 502, sleeping 2214 then retry http://revproxy-service/api/v0/submission/getschema                                                                                              │
│ failed fetch - non-200 from server: 502, sleeping 2922 then retry http://revproxy-service/api/v0/submission/_dictionary/_all                                                                                       │
│ Retrying http://revproxy-service/api/v0/submission/getschema after sleep - 1                                                                                                                                       │
│ Re-fetching http://revproxy-service/api/v0/submission/getschema - retry no 1                                                                                                                                       │
│ failed fetch - non-200 from server: 502, sleeping 4652 then retry http://revproxy-service/api/v0/submission/getschema                                                                                              │
│ Retrying http://revproxy-service/api/v0/submission/_dictionary/_all after sleep - 1                                                                                                                                │
│ Re-fetching http://revproxy-service/api/v0/submission/_dictionary/_all - retry no 1                                                                                                                                │
│ failed fetch - non-200 from server: 502, sleeping 4409 then retry http://revproxy-service/api/v0/submission/_dictionary/_all                                                                                       │
│ Retrying http://revproxy-service/api/v0/submission/getschema after sleep - 2                                                                                                                                       │
│ Re-fetching http://revproxy-service/api/v0/submission/getschema - retry no 2                                                                                                                                       │
│ failed fetch - non-200 from server: 502, sleeping 9668 then retry http://revproxy-service/api/v0/submission/getschema                                                                                              │
│ Retrying http://revproxy-service/api/v0/submission/_dictionary/_all after sleep - 2                                                                                                                                │
│ Re-fetching http://revproxy-service/api/v0/submission/_dictionary/_all - retry no 2                                                                                                                                │
│ failed fetch - non-200 from server: 502, sleeping 8286 then retry http://revproxy-service/api/v0/submission/_dictionary/_all                                                                                       │
│ Retrying http://revproxy-service/api/v0/submission/_dictionary/_all after sleep - 3                                                                                                                                │
│ Re-fetching http://revproxy-service/api/v0/submission/_dictionary/_all - retry no 3                                                                                                                                │
│ failed fetch - non-200 from server: 502, sleeping 16252 then retry http://revproxy-service/api/v0/submission/_dictionary/_all                                                                                      │
│ Retrying http://revproxy-service/api/v0/submission/getschema after sleep - 3                                                                                                                                       │
│ Re-fetching http://revproxy-service/api/v0/submission/getschema - retry no 3                                                                                                                                       │
│ failed fetch - non-200 from server: 502, sleeping 17910 then retry http://revproxy-service/api/v0/submission/getschema                                                                                             │
│ Retrying http://revproxy-service/api/v0/submission/_dictionary/_all after sleep - 4                                                                                                                                │
│ Re-fetching http://revproxy-service/api/v0/submission/_dictionary/_all - retry no 4                                                                                                                                │
│ failed fetch - non-200 from server: 502, sleeping 16509 then retry http://revproxy-service/api/v0/submission/_dictionary/_all                                                                                      │
│ Retrying http://revproxy-service/api/v0/submission/getschema after sleep - 4                                                                                                                                       │
│ Re-fetching http://revproxy-service/api/v0/submission/getschema - retry no 4                                                                                                                                       │
│ failed fetch - non-200 from server: 502, sleeping 17043 then retry http://revproxy-service/api/v0/submission/getschema                                                                                             │
│ Retrying http://revproxy-service/api/v0/submission/_dictionary/_all after sleep - 5                                                                                                                                │
│ Re-fetching http://revproxy-service/api/v0/submission/_dictionary/_all - retry no 5                                                                                                                                │
│ Error:  failed fetch non-200 from server: 502, max retries 4 exceeded for http://revproxy-service/api/v0/submission/_dictionary/_all                                                                               │
│ Stream closed EOF for default/portal-deployment-6c7f86d4f8-brqjn (portal)            
scymz1 commented 6 months ago

I have solved problem by myself, I think I delete pvc and change values.yaml to persistent solved this issues, but not sure the reason for it. delete pvc kubectl delete pvc --all

change values.yaml to this?

global:
  dev: true
  hostname: localhost

# configuration for fence helm chart. You can add it for all our services.
fence:
  # Fence config overrides 
  FENCE_CONFIG:
    OPENID_CONNECT:
      google:
        client_id: ""
        client_secret: ""

    AWS_CREDENTIALS:
      'fence-bot':
        aws_access_key_id: ''
        aws_secret_access_key: ''

    S3_BUCKETS:
      # Name of the actual s3 bucket
      jq-helm-testing:
        cred: 'fence-bot'
        region: us-east-1

    # This is important for data upload.
    DATA_UPLOAD_BUCKET: 'jq-helm-testing'

portal:
  image: 
    repository: quay.io/cdis/data-portal-prebuilt 
    tag: brh.data-commons.org-feat-pr_comment
  resources:
    requests:
      cpu: 0.2
      memory: 500Mi

postgresql:
  primary:
    persistence:
      enabled: true
jawadqur commented 6 months ago

Hello @scymz1

Thanks for reporting this issue.

I'll try to take a look into this today.

If deleting the PVC's helped, the issue might have been that your postgresql pod had gen3 users created from a previous deployment, and the new deployment will try to generate new credentials and you run into a mismatch of passwords. You should see that in the logs of the failing pods.

scymz1 commented 6 months ago

Hi @jawadqur Wired things happens that for deployment when I uninstalled using helm and try to reinstall it, the problem occurs again and I cannot solve it. I delete pvc before reinstall but still not working.

My current values.yaml file:

global:
  dev: true
  hostname: localhost

# configuration for fence helm chart. You can add it for all our services.
fence:
  # Fence config overrides 
  FENCE_CONFIG:
    OPENID_CONNECT:
      google:
        client_id: ""
        client_secret: ""

    AWS_CREDENTIALS:
      'fence-bot':
        aws_access_key_id: ''
        aws_secret_access_key: ''

    S3_BUCKETS:
      # Name of the actual s3 bucket
      jq-helm-testing:
        cred: 'fence-bot'
        region: us-east-1

    # This is important for data upload.
    DATA_UPLOAD_BUCKET: 'jq-helm-testing'

portal:
  image: 
    repository: quay.io/cdis/data-portal-prebuilt 
    tag: brh.data-commons.org-feat-pr_comment
  resources:
    requests:
      cpu: 0.2
      memory: 500Mi

arborist:
  postgres:
    dbCreate: true
    username: gen3_arborist
    password: gen3_arborist

Here is my logs:

get pods
NAME                                              READY   STATUS                       RESTARTS        AGE
portal-deployment-c9d6d9776-lb78f                 0/1     Pending                      0               20m
sower-85597bddbf-fdsm6                            0/1     Pending                      0               20m
fence-deployment-6f8489fb88-lxgt4                 0/1     Pending                      0               20m
hatchery-deployment-7fcb68fb65-zcg76              0/1     Pending                      0               20m
revproxy-deployment-9764957cd-djnh7               1/1     Running                      0               20m
gen3-postgresql-0                                 1/1     Running                      0               20m
wts-oidc-job-qhwbd                                0/2     Init:0/1                     0               20m
manifestservice-deployment-6c74479448-jmznd       1/1     Running                      0               20m
peregrine-dbcreate-rswdw                          0/1     Completed                    0               20m
arborist-dbcreate-78jvw                           0/1     Completed                    0               20m
metadata-dbcreate-zc8b7                           0/1     Completed                    0               20m
sheepdog-dbcreate-p55hq                           0/1     Completed                    0               20m
wts-dbcreate-2kvvs                                0/1     Completed                    0               20m
audit-dbcreate-g7l6q                              0/1     Completed                    0               20m
fence-dbcreate-rmbrw                              0/1     Completed                    0               20m
indexd-dbcreate-mfjmz                             0/1     Completed                    0               20m
gen3-elasticsearch-master-0                       1/1     Running                      0               20m
arborist-deployment-77645d555-pzl8v               1/1     Running                      0               20m
ambassador-deployment-6cd65d48d6-xm8db            1/1     Running                      0               20m
wts-deployment-57ff756898-55d4b                   0/1     CreateContainerConfigError   0               20m
indexd-deployment-b845d4565-cjcr9                 1/1     Running                      0               20m
metadata-deployment-559bbdd459-gm88s              1/1     Running                      0               20m
audit-deployment-54c96847c8-bv8g4                 1/1     Running                      0               20m
indexd-userdb-jtxx9                               0/1     Completed                    0               20m
presigned-url-fence-deployment-6d657f9cfd-bb45d   1/1     Running                      0               20m
argo-wrapper-deployment-85f5d4b756-ft7cb          0/1     ImagePullBackOff             0               20m
useryaml-qm2qn                                    0/1     CrashLoopBackOff             7 (2m37s ago)   20m
sheepdog-deployment-746959d756-fk2pz              0/1     CrashLoopBackOff             8 (114s ago)    20m
pidgin-deployment-b9c7c5b7d-rt86d                 0/1     Running                      5 (82s ago)     20m
peregrine-deployment-6d9b6b584b-jnsc6             0/1     Running                      5 (22s ago)     20m

For pod useryaml-qm2qn :

[notice] A new release of pip is available: 23.3.1 -> 24.0                                                                                                                                                         │
│ [notice] To update, run: pip install --upgrade pip                                                                                                                                                                 │
│ [2024-04-02 15:22:25,537][gen3config.config][   INFO] Opening default configuration...                                                                                                                             │
│ [2024-04-02 15:22:26,196][gen3config.config][   INFO] Applying configuration: /var/www/fence/fence-config.yaml                                                                                                     │
│ [2024-04-02 15:22:26,606][gen3config.config][WARNING] Did not provide key(s) dict_keys(['DB', 'DB_MIGRATION_POSTGRES_LOCK_KEY', 'SESSION_COOKIE_DOMAIN', 'GA4GH_DRS_POSTED_PASSPORT_FIELD', 'PRIVACY_POLICY_URL',  │
│ [2024-04-02 15:22:26,978][fence.config][   INFO] Found environment variable 'DB': overriding 'DB' field from config file                                                                                           │
│ [2024-04-02 15:22:26,978][fence.config][   INFO] Found environment variable 'INDEXD_PASSWORD': overriding 'INDEXD_PASSWORD' field from config file                                                                 │
│ WARNING:flask_cors.core:Unknown option passed to Flask-CORS: headers                                                                                                                                               │
│ WARNING:flask_cors.core:Unknown option passed to Flask-CORS: headers                                                                                                                                               │
│ [2024-04-02 15:22:31,637][gen3config.config][   INFO] Opening default configuration...                                                                                                                             │
│ [2024-04-02 15:22:32,292][gen3config.config][   INFO] Applying configuration: /var/www/fence/fence-config.yaml                                                                                                     │
│ [2024-04-02 15:22:32,705][gen3config.config][WARNING] Did not provide key(s) dict_keys(['DB', 'DB_MIGRATION_POSTGRES_LOCK_KEY', 'SESSION_COOKIE_DOMAIN', 'GA4GH_DRS_POSTED_PASSPORT_FIELD', 'PRIVACY_POLICY_URL',  │
│ [2024-04-02 15:22:33,077][fence.config][   INFO] Found environment variable 'DB': overriding 'DB' field from config file                                                                                           │
│ [2024-04-02 15:22:33,077][fence.config][   INFO] Found environment variable 'INDEXD_PASSWORD': overriding 'INDEXD_PASSWORD' field from config file                                                                 │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'anonymous_policies'. Fix sugge │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'anonymous_poli │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'full_open_access' has 'service = *'. This is unsecure because policy 'full_open_access' is granted to public group 'anonymous_policies'. Fix sugge │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'full_open_access' has 'service = *'. This is unsecure because policy 'full_open_access' is granted to public group 'anonymous_poli │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'all_users_policies'. Fix sugge │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'open_data_reader' has 'service = *'. This is unsecure because policy 'open_data_reader' is granted to public group 'all_users_poli │
│ ERROR:gen3users:Permission 'reader' in role 'reader' in policy 'authn_open_access' has 'service = *'. This is unsecure because policy 'authn_open_access' is granted to public group 'all_users_policies'. Fix sug │
│ ERROR:gen3users:Permission 'storage_reader' in role 'storage_reader' in policy 'authn_open_access' has 'service = *'. This is unsecure because policy 'authn_open_access' is granted to public group 'all_users_po │
│ [2024-04-02 15:22:33,610][user_syncer][  ERROR] user.yaml validation failed. See errors in previous logs.                                                                                                          │
│ [2024-04-02 15:22:33,610][user_syncer][  ERROR] aborting early                                                                                                                                                     │
│ Traceback (most recent call last):                                                                                                                                                                                 │
│   File "/usr/local/bin/fence-create", line 6, in <module>                                                                                                                                                          │
│     sys.exit(main())            File "/fence/bin/fence_create.py", line 502, in main                                                                                                                                                             │
│     sync_users(                                                                                                                                                                                                    │
│   File "/fence/fence/scripting/fence_create.py", line 510, in sync_users                                                                                                                                           │
│     syncer.sync()                                                                                                                                                                                                  │
│   File "/fence/fence/sync/sync_users.py", line 1515, in sync                                                                                                                                                       │
│     self._sync(s)                                                                                                                                                                                                  │
│   File "/fence/fence/sync/sync_users.py", line 1577, in _sync                                                                                                                                                      │
│     user_yaml = UserYAML.from_file(                                                                                                                                                                                │
│   File "/fence/fence/sync/sync_users.py", line 172, in from_file                                                                                                                                                   │
│     validate_user_yaml(file_contents)  # run user.yaml validation tests                                                                                                                                            │
│   File "/usr/local/lib/python3.9/site-packages/gen3users/validation.py", line 58, in validate_user_yaml                                                                                                            │
│     raise AssertionError(                                                                                                                                                                                          │
│ AssertionError: user.yaml validation failed. See errors in previous logs.    

For pod gen3-postgresql-0:

postgresql 15:19:06.43 INFO  ==> ** Starting PostgreSQL **                                                                                                                                                         │
│ 2024-04-02 15:19:06.491 GMT [1] LOG:  pgaudit extension initialized                                                                                                                                                │
│ 2024-04-02 15:19:06.602 GMT [1] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit                                                                   │
│ 2024-04-02 15:19:06.608 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432                                                                                                                               │
│ 2024-04-02 15:19:06.608 GMT [1] LOG:  listening on IPv6 address "::", port 5432                                                                                                                                    │
│ 2024-04-02 15:19:06.611 GMT [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"                                                                                                                                │
│ 2024-04-02 15:19:06.623 GMT [367] LOG:  database system was shut down at 2024-04-02 15:19:06 GMT                                                                                                                   │
│ 2024-04-02 15:19:06.635 GMT [1] LOG:  database system is ready to accept connections                                                                                                                               │
│ 2024-04-02 15:19:21.054 GMT [397] FATAL:  password authentication failed for user "peregrine_gen3"                                                                                                                 │
│ 2024-04-02 15:19:21.054 GMT [397] DETAIL:  Role "peregrine_gen3" does not exist.                                                                                                                                   │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:21.703 GMT [405] FATAL:  password authentication failed for user "sheepdog_gen3"                                                                                                                  │
│ 2024-04-02 15:19:21.703 GMT [405] DETAIL:  Role "sheepdog_gen3" does not exist.                                                                                                                                    │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:22.081 GMT [407] FATAL:  password authentication failed for user "metadata_gen3"                                                                                                                  │
│ 2024-04-02 15:19:22.081 GMT [407] DETAIL:  Role "metadata_gen3" does not exist.                                                                                                                                    │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:22.654 GMT [413] FATAL:  password authentication failed for user "gen3_arborist"                                                                                                                  │
│ 2024-04-02 15:19:22.654 GMT [413] DETAIL:  Role "gen3_arborist" does not exist.                                                                                                                                    │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:22.969 GMT [417] FATAL:  password authentication failed for user "wts_gen3"                                                                                                                       │
│ 2024-04-02 15:19:22.969 GMT [417] DETAIL:  Role "wts_gen3" does not exist.                                                                                                                                         │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:32.283 GMT [501] FATAL:  password authentication failed for user "audit_gen3"                                                                                                                     │
│ 2024-04-02 15:19:32.283 GMT [501] DETAIL:  Role "audit_gen3" does not exist.                                                                                                                                       │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:32.327 GMT [503] FATAL:  password authentication failed for user "indexd_gen3"                                                                                                                    │
│ 2024-04-02 15:19:32.327 GMT [503] DETAIL:  Role "indexd_gen3" does not exist.                                                                                                                                      │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:32.618 GMT [505] FATAL:  password authentication failed for user "fence_gen3"                                                                                                                     │
│ 2024-04-02 15:19:32.618 GMT [505] DETAIL:  Role "fence_gen3" does not exist.                                                                                                                                       │
│     Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"                                                                                                  │
│ 2024-04-02 15:19:41.918 GMT [575] ERROR:  relation "db_version" does not exist at character 21                                                                                                                     │
│ 2024-04-02 15:19:41.918 GMT [575] STATEMENT:  select version from db_version

for peregrine-deployment-6d9b6b584b-jnsc6

*** Python threads support is disabled. You can enable it with --enable-threads ***                                                                                                                                │
│ Python main interpreter initialized at 0x4000172420                                                                                                                                                                │
│ your server socket listen backlog is limited to 100 connections                                                                                                                                                    │
│ your mercy for graceful operations on workers is 45 seconds                                                                                                                                                        │
│ mapped 304776 bytes (297 KB) for 2 cores                                                                                                                                                                           │
│ *** Operational MODE: preforking ***                                                                                                                                                                               │
│ added /var/www/peregrine/ to pythonpath.                                                                                                                                                                           │
│ added /peregrine/ to pythonpath.                                                                                                                                                                                   │
│ added /usr/local/lib/python3.6/site-packages/ to pythonpath.                                                                                                                                                       │
│ failed to open python file /var/www/peregrine/wsgi.py                                                                                                                                                              │
│ unable to load app 0 (mountpoint='') (callable not found or import error)                                                                                                                                          │
│ *** no app loaded. going in full dynamic mode ***                                                                                                                                                                  │
│ *** uWSGI is running in multiple interpreter mode ***                                                                                                                                                              │
│ spawned uWSGI master process (pid: 2013)                                                                                                                                                                           │
│ spawned uWSGI worker 1 (pid: 2019, cores: 1)                                                                                                                                                                       │
│ spawned uWSGI worker 2 (pid: 2021, cores: 1)                                                                                                                                                                       │
│ --- no python application found, check your startup logs for errors ---                                                                                                                                            │
│ --- no python application found, check your startup logs for errors ---                                                                                                                                            │
│ --- no python application found, check your startup logs for errors ---                                                                                                                                            │
│ --- no python application found, check your startup logs for errors ---                                                                                                                                            

For wts-deployment-57ff756898-55d4b:

stream logs failed container "wts" in pod "wts-deployment-57ff756898-55d4b" is waiting to start: CreateContainerConfigError for default/wts-deployment-57ff756898-55d4b (wts) 
jawadqur commented 6 months ago

https://github.com/uc-cdis/gen3-helm/pull/164

I merged a fix for sheepdog / peregrine issues.

For the useryaml job you need to specify a valid user yaml in your fence configuration.