medic / cht-user-management

GNU Affero General Public License v3.0
3 stars 1 forks source link

Recover system logs for error/debug analys #51

Closed mrjones-plip closed 5 months ago

mrjones-plip commented 5 months ago

On Feb 5 2024 a user on the KE instance of production was using the tool and encountered some errors. We want to retrieve the server side logs to further debug what happened. Opening this ticket to track the effort and document steps to get log files from production.

mrjones-plip commented 5 months ago

Steps to recover logs for past 2 days (48 hours):

  1. log into system: ./eks-aws-mfa-login USERNAME TOTP-6DIGIT-TOKEN
  2. switch to production kubectl config use-context arn:aws:eks:eu-west-2:720541322708:cluster/prod-cht-eks
  3. find the pod name for the Kenyan instance: kubectl -n users-chis-prod get pods
  4. dump logs to Feb.3-5.2024.prod.ke.log for past 48 hours for kenyan pod based off result of prior command. Add timestamps=true to get full timestamps, not just time: kubectl -n users-chis-prod logs users-chis-ke-cht-user-management-7f5584c4b9-m88k4 --since=48h --timestamps=true > Feb.3-5.2024.prod.ke.log

Note that events will be in reverse chronological order (newest at the top of file).

Additionally, if you would like get a shell on the produciton instance, you can use the pod name from step 3 above as follow:

kubectl -n users-chis-prod exec --stdin --tty users-chis-ke-cht-user-management-7f5584c4b9-m88k4 -- /bin/sh
mrjones-plip commented 5 months ago

closing assuming this is sufficiently documented and logs ended up not being helpful per #52