uc-cdis / gen3-helm

Helm charts for Gen3 Deployments
Apache License 2.0
9 stars 23 forks source link

pod portal-deployment error 0/1 in k8s cluster #202

Open zhengyansheng opened 5 days ago

zhengyansheng commented 5 days ago

use gen3-helm install in k8s cluster


➜  ~ cat values.yaml
global:
  hostname: gen3.xxx.cn

fence:
  FENCE_CONFIG:
    # if true, will bypass OIDC login, and login a user with username "test"
    # WARNING: DO NOT ENABLE IN PRODUCTION (for testing purposes only)
    #MOCK_AUTH: true

    OPENID_CONNECT:
      google:
        client_id: "xxxxxx"
        client_secret: "xxxxx"

➜  ~ helm upgrade --install gen3 gen3/gen3 -f ./values.yaml 

➜  ~ kubectl get pods -n gen3
NAME                                              READY   STATUS      RESTARTS   AGE
ambassador-deployment-89c5b974d-t7964             1/1     Running     0          61m
arborist-dbcreate-7fld6                           0/1     Completed   0          61m
arborist-deployment-9c498b6f6-8f2tb               1/1     Running     0          61m
audit-dbcreate-mwl87                              0/1     Completed   0          61m
audit-deployment-75c8b5d847-rkzdn                 1/1     Running     0          61m
fence-dbcreate-vt58l                              0/1     Completed   0          61m
fence-deployment-7d8c4f8f67-xnkb2                 1/1     Running     0          61m
gen3-elasticsearch-master-0                       1/1     Running     0          61m
gen3-postgresql-0                                 1/1     Running     0          61m
hatchery-deployment-6586f6f4-p8qsn                1/1     Running     0          61m
indexd-dbcreate-qlscw                             0/1     Completed   0          61m
indexd-deployment-677fb9885d-4mw46                1/1     Running     0          61m
indexd-userdb-scgn4                               0/1     Completed   0          61m
manifestservice-deployment-6b6c8895f9-7nbz2       1/1     Running     0          61m
metadata-dbcreate-4kknx                           0/1     Completed   0          61m
metadata-deployment-5f6787d5dd-krrrx              1/1     Running     0          61m
peregrine-dbcreate-wkzmq                          0/1     Completed   0          61m
peregrine-deployment-859d7c49c9-7mrw2             1/1     Running     0          61m
pidgin-deployment-c59bc687-d6w28                  1/1     Running     0          61m
portal-deployment-65b6468fb7-kwk64                0/1     Running     0          2m54s
presigned-url-fence-deployment-6559c5b9f9-n754p   1/1     Running     0          61m
revproxy-deployment-6c7ff7748d-tv5hx              1/1     Running     0          61m
sheepdog-dbcreate-8fsl5                           0/1     Completed   0          61m
sheepdog-deployment-d64468579-fmg9n               1/1     Running     0          61m
sower-599bdcbdc5-hhh5w                            1/1     Running     0          61m
useryaml-6mhrj                                    1/1     Running     0          2m54s
wts-dbcreate-hsghs                                0/1     Completed   0          61m
wts-deployment-55749bdc9c-5m9ft                   1/1     Running     0          61m
wts-oidc-job-t2xwt                                0/2     Completed   0          61m

➜  ~ kubectl get pods -n gen3 |grep "Running" |grep "0/1"
portal-deployment-65b6468fb7-kwk64                0/1     Running     0             4m1s

➜  ~ kubectl logs -f portal-deployment-65b6468fb7-kwk64
......

Unchanged: 7 files
INFO: Generating parameters.json

> cloud_portal@0.1.0 params
> node ./data/getTexts > src/params.js

INFO: Running sanity-check for non-workspace bundle

> cloud_portal@0.1.0 sanity-check
> node ./sanity-check

INFO: Ready for webpack
npx webpack build
Unknown GA tag, skipping GA setup...

# Browser access failed (try chrome/firefox, but not )

The connection has timed out

An error occurred during a connection to gen3.wcrcnet.cn.

    The site could be temporarily unavailable or too busy. Try again in a few moments.
    If you are unable to load any pages, check your computer’s network connection.
    If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the web.

# log
➜  ~ kubectl logs -f revproxy-deployment-6c7ff7748d-tv5hx
{"gen3log": "nginx", "date_access": "2024-10-21T03:08:20+00:00", "user_id": "uid:null,unknown@unknown", "request_id": "029b469cd5a4e0ea5892c27136c7a24e", "session_id": "029b469cd5a4e0ea5892c27136c7a24e", "visitor_id": "029b469cd5a4e0ea5892c27136c7a24e", "network_client_ip": "172.27.16.9", "network_bytes_write": 0, "response_secs": 1.999, "http_status_code": 499, "http_request": "/", "http_verb": "HEAD", "http_referer": "-", "http_useragent": "clb-healthcheck", "http_upstream": "http://portal-service.gen3.svc.cluster.local", "proxy_service": "portal", "message": "HEAD / HTTP/1.1" }
{"gen3log": "nginx", "date_access": "2024-10-21T03:08:21+00:00", "user_id": "uid:null,unknown@unknown", "request_id": "5a93fdecd6c731b61410cb85a7cd4544", "session_id": "5a93fdecd6c731b61410cb85a7cd4544", "visitor_id": "5a93fdecd6c731b61410cb85a7cd4544", "network_client_ip": "172.27.16.38", "network_bytes_write": 0, "response_secs": 2.000, "http_status_code": 499, "http_request": "/", "http_verb": "HEAD", "http_referer": "-", "http_useragent": "clb-healthcheck", "http_upstream": "http://portal-service.gen3.svc.cluster.local", "proxy_service": "portal", "message": "HEAD / HTTP/1.1" }
2024/10/21 03:08:21 [error] 10#10: *1023 connect() failed (111: Connection refused) while connecting to upstream, client: 172.27.16.9, server: , request: "HEAD / HTTP/1.1", upstream: "http://192.168.1.68:80/", host: "gen3.xxx.cn"
{"gen3log": "nginx", "date_access": "2024-10-21T03:08:21+00:00", "user_id": "uid:null,unknown@unknown", "request_id": "9e807844375adb819a3221e24e4b85bf", "session_id": "9e807844375adb819a3221e24e4b85bf", "visitor_id": "9e807844375adb819a3221e24e4b85bf", "network_client_ip": "172.27.16.9", "network_bytes_write": 0, "response_secs": 1.031, "http_status_code": 502, "http_request": "/", "http_verb": "HEAD", "http_referer": "-", "http_useragent": "clb-healthcheck", "http_upstream": "http://portal-service.gen3.svc.cluster.local", "proxy_service": "portal", "message": "HEAD / HTTP/1.1" }
zhengyansheng commented 5 days ago

append

pod portal-deployment is ready, but Browser still unable to access

➜ ~ k get pods |grep portal-deployment portal-deployment-65b6468fb7-6x5sf 1/1 Running 1 (4h31m ago) 4h32m portal-deployment-65b6468fb7-kwk64 1/1 Running 0 9h

jawadqur commented 5 days ago

Hi @zhengyansheng

Sometimes portal will take a lot of resources to build, since it runs webpack on runtime to accomodate for all different configurations.

https://github.com/uc-cdis/gen3-helm/blob/master/docs/portal/prebuild-portal.md

Can you try prebuilding your portal image instead and see if that helps?

If this helps, then most likely we need to test that your portal pods have connectivity to gather all configurations and does not get these errors:

# Browser access failed (try chrome/firefox, but not )

The connection has timed out

An error occurred during a connection to gen3.wcrcnet.cn.

    The site could be temporarily unavailable or too busy. Try again in a few moments.
    If you are unable to load any pages, check your computer’s network connection.
    If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the web.

Also make sure it has enough resources to build webpack, sometimes we've seen adding more CPU and memory has helped.

I recommend going the prebuilding of portal route if you can get that to work, as you can decrease resources for portal needed then dramatically, as then it's just an nginx container serving static files.