medic / cht-user-management

GNU Affero General Public License v3.0
3 stars 1 forks source link

Medic-hosted Togo User Management Tool #110

Closed kennsippell closed 6 months ago

kennsippell commented 6 months ago

Based on prior effort for KE and UG, steps should include:

mrjones-plip commented 6 months ago

Created CNAME record for users-chis-tg.app to point to k8s-prodchtalb-dcc00345ac-1792311525.eu-west-2.elb.amazonaws.com on this AWS page:

image

There's now three users-chis-*.app DNS entries pointing to the same ELB:

image

mrjones-plip commented 6 months ago

@kennsippell and @jdndiaye - I note that in the pending PR we refer to the config as chis-tg and not chis-togo. As we refer to ke and ug and not kenya and uganda, I'm changing the domain name to match so domain <> country code <> config name.

It's now: users-chis-tg.app.medicmobile.org (note togo -> tg)

I'll update the above screenshots when I redo the DNS entry

mrjones-plip commented 6 months ago

@nydr or @Hareet - if you have a sec, can you see what I'm doing wrong here? I'm getting a 404 on the URL https://users-chis-tg.app.medicmobile.org/ when I'm expecting the user man app to show up after deploying it with helm.

so far I have:

  1. created a CNAME DNS entry per above
  2. created a new values yaml file - all steps below use the togo-deploy-values-readme-update branch
  3. done the initial install with: helm install --kube-context arn:aws:eks:eu-west-2:720541322708:cluster/prod-cht-eks --namespace users-chis-prod --values values/users-chis-tg.yaml users-chis-tg /home/mrjones/Documents/MedicMobile/helm-charts/charts/cht-user-management/
  4. upgraded a couple times to be sure: helm upgrade --kube-context arn:aws:eks:eu-west-2:720541322708:cluster/prod-cht-eks --namespace users-chis-prod --values values/users-chis-tg.yaml users-chis-tg /home/mrjones/Documents/MedicMobile/helm-charts/charts/cht-user-management/
  5. waited 10+ minutes (last time it took 5+ min for the first install and I wasn't patient enough ;)
  6. checked history which looks OK as compared to ug and ke instance:
    REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION     
    1               Wed Mar 20 21:53:32 2024        superseded      cht-user-management-0.2.0                       Install complete
    2               Wed Mar 20 21:53:49 2024        superseded      cht-user-management-0.2.0                       Upgrade complete
    3               Wed Mar 20 21:55:08 2024        superseded      cht-user-management-0.2.0                       Upgrade complete
    4               Wed Mar 20 22:03:47 2024        deployed        cht-user-management-0.2.0                       Upgrade complete
  7. checked logs and don't see any obvious errors (though they don't look super verbose :thinking: )
nydr commented 6 months ago

I suspect you need to distribute the app config too (see step 2 in top of this issue) From kubectl logs:

Error: Failed to start: Cannot find configuration "CHIS-TG". Configurations available are ["CHIS-KE","CHIS-UG"]

mrjones-plip commented 6 months ago

thanks @nydr!

How did you get that error log entry with kubectl? When I tried kubectl --namespace users-chis-prod logs users-chis-tg-cht-user-management-6c99c59686-q872g I didn't see that entry anywhere (nor did grep ;)

To test that this wasn't the issue, I changed the values/users-chis-tg.yaml to have chis-ug as it's config name and deployed it. Here's me checking the values::

helm --namespace users-chis-prod get values users-chis-tg|grep CONFIG             
    CONFIG_NAME: chis-ug

I waited 10 minutes to see if the 404 went away but it stuck. I've reverted it back to chis-tg

Is there maybe something else I'm missing?

nydr commented 6 months ago

How did you get that error log entry

There's two tg pods, that log is from the other one.

The url is working for me, but showing the ug configuration:

❯ curl -i https://users-chis-tg.app.medicmobile.org
HTTP/2 302
date: Mon, 25 Mar 2024 13:08:02 GMT
content-length: 0
location: /login

Looking at the deployment you can see that there's two replica sets active:

❯ describe deploy users-chis-tg-cht-user-management
OldReplicaSets:  users-chis-tg-cht-user-management-6c99c59686 (1/1 replicas created)
NewReplicaSet:   users-chis-tg-cht-user-management-6ccc4b5c4c (1/1 replicas created)
❯ get rs
NAME                                           DESIRED   CURRENT   READY   AGE
users-chis-tg-cht-user-management-6c99c59686   1         1         1       4d5h
users-chis-tg-cht-user-management-6ccc4b5c4c   1         1         0       4d5h

Where the old replica set has the ug config and the new replica set is not considered ready

❯ get po
NAME                                                 READY   STATUS             RESTARTS           AGE
users-chis-tg-cht-user-management-6c99c59686-q872g   1/1     Running            0                  4d8h
users-chis-tg-cht-user-management-6ccc4b5c4c-zt4t9   0/1     CrashLoopBackOff   1042 (2m53s ago)   3d16h

and looking at the logs of the crashing pod (which belongs to the "not ready" replica set) you see the log entry in question

❯ logs users-chis-tg-cht-user-management-6ccc4b5c4c-zt4t9

> cht-user-management@1.1.6 start
> node dist/index.js

Using configuration: chis-tg
/app/dist/config/config-factory.js:18
        throw Error(`Failed to start: Cannot find configuration "${usingKey}". Configurations available are ${available}`);
              ^

Error: Failed to start: Cannot find configuration "CHIS-TG". Configurations available are ["CHIS-KE","CHIS-UG"]
    at getConfigByKey (/app/dist/config/config-factory.js:18:15)
mrjones-plip commented 6 months ago

Thanks so much @nydr ! I'm able to see a lot more with your commands. So...I think what's happening is that because I deployed the ug config once, it's finding that as a good pod and when I deploy the tg config, it's finding that as a bad pod and failing back to the good one? I tried deleting a replica to no avail (but I don't know what I'm doing :shrug: )

If that makes sense to you, let's wait for the config to be done, and then push that once done to see if the tg config works. If you think something else is going on - lemme know!

nydr commented 6 months ago

it's finding that as a bad pod and failing back to the good one

That's pretty much it, the kubernetes deployment is configured to have a minimum number of available ("ready") pods and it will keep the "old" ones until enough of the "new" ones are ready

mrjones-plip commented 6 months ago

@kennsippell - since you can push tg to prod, you should be good push again when the config is merged to main and a new image is published.

assigning over to you for final push and test!

mrjones-plip commented 6 months ago

now that instance is live at https://users-chis-tg.app.medicmobile.org, closing this ticket

nice work @kennsippell !