onaio / Reveal-Thailand

Repository to raise & track issues, share code and documentation, and manage the Reveal Thailand Handover and Support Project (2022-2023)
0 stars 0 forks source link

[Support Request]: Plans are not being generated on Reveal Web #15

Closed AngelaKabari closed 1 year ago

AngelaKabari commented 1 year ago

Describe the issue

  1. Go to the Reveal Web portal, and click on create a new action plan
  2. Fill in the plan details and click on the blue "operational plan record" button
  3. The plan cannot be generated, and shows the error message "The system is down. Sorry for the inconvenience."

How is this expected to work?

The user should be able to generate a plan

Screenshots

System down error

Please share other relevant information about the issue

N/A

AngelaKabari commented 1 year ago

Investigations carried out thus far are as follows:

1. Check that the error is not from the server Checked the request fails with a 403 error response. It may be an issue on the server, possibly has to do with CORS misconfiguration. The requests do return data when done from the browser or a client like insomnia

2. Check that the web URL is added to whitelisted URLs on the Keycloak realm Added the web and server domains to the web origins section on keycloak but are still getting the same error. image (4)

AngelaKabari commented 1 year ago

The Ona hosted preview environment is also now throwing the The system is down. Sorry for the inconvenience. error message:

Preview system down error

Though there are the following differences:

  1. The error message is not persisting unlike the error on the production environment
  2. A user can still generate plans unlike the production environment
Rkareko commented 1 year ago

We replicated the issue using the bvbd_mhealth account. The following logs were inspected during the investigation

  1. Opensrp tomcat logs using the tail -fn 100 /home/opensrp/tomcat-opensrp/logs/catalina.out
  2. The nginx logs using the command tail -fn 100 /var/log/nginx/servermhealth.ddc.moph.go.th-https-access.log .

We did not view any logs pertaining to the POST request containing the plan payload on either of the logs. This was tested both on the web UI as well as using Postman.

AngelaKabari commented 1 year ago

The 2023 SSL certificates were manually re-installed on 19th January 2023, and DVBD, CHAI and Biophics informed via LINE. The installation of the SSL certificates resolved issue #15, as once https://mhealth.ddc.moph.go.th/ had an up to date SSL certificate, the server stopped giving 403 errors therefore plans could be generated once again.

Ona subsequently carried out a retrospective to try to establish what happened and found the following:

  1. System access: The production server is accessed via VPN, and at the moment we share common credentials to the Reveal production environment. This means that it is not possible to know who made changes as we cannot tell which IP address any changes emanate from.
  2. The Reveal server environment is deployed by running an ansible playbook, that is a specific set of instructions that we use to set up an OpenSRP deployment. The current ansible configurations for reveal-web preview (https://preview-mhealth.ddc-malaria.org) and production (https://mhealth.ddc.moph.go.th) is using an outdated SSL certificate. This means that whenever the ansible playbooks are run, there is an automatic process driven by an SSL Certificate bot to install a particular set of certificates. At this moment, the SSL certificates on the bot are old ones (issued by Let's Encrypt) instead of the 2022/2023 the ones currently being used (issued by DigiCert). 
  3. At some point between January 10th and 18th, the Reveal production server environment was accessed and the ansible playbook for Reveal Web production was run. The Let’s Encrypt SSL Certificate configs therefore overwrote the existing manual DigiCert configuration for the production setup. We have not found any logs that show that this was changed it manually, therefore the change could only come from ansible.

The steps we have taken to ensure that the above issue does not occur again are:

  1. We have requested DVBD to issue the project team with organisation specific (CHAI, Biophics and Ona) credentials to allow for proper attribution going forward.
  2. We recommend that an auditing tool be be installed on the server to log details on server access and file changes.
  3. We have created an issue #28 Disable Certbort in reveal-web deployment configs to address the Certbot roles that are currently enabled for the Thailand Reveal Web ansible playbooks. This means that if the server ever needs to be set up again or the ansible playbooks are run, the automated set up procedure will ask for SSL certificates, which will allow the person running the deployment to manually apply certificates that are valid for that current year.

We have created issue #29 Implement proper attribution on the Reveal System to track the implementation of the first two recommendations above.

Originally posted by @AngelaKabari in https://github.com/onaio/infrastructure/issues/6286#issuecomment-1454801093