medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
467 stars 217 forks source link

Don't hard code COUCHDB_SECRET in docker compose files #7800

Closed mrjones-plip closed 2 years ago

mrjones-plip commented 2 years ago

Describe the issue We hard code the secret for CouchDB in the Arch v3 docker setup as shown here:

"COUCHDB_SECRET=${COUCHDB_SECRET:-6c1953b6-e64d-4b0c-9268-2528396f2f58}"

This is insecure as it is public and will be used by default unless users override it.

Describe the improvement you'd like We should dynamically generate this at install time or mandate that users specify a unique one per install.

Describe alternatives you've considered NA

mrjones-plip commented 2 years ago

cc @garethbowen per our call today

garethbowen commented 2 years ago

I think we should not default the UUID too.

garethbowen commented 2 years ago

I had a look at this in the 7812-require-password branch but ran out of time to get the build to pass. I think there's some issue with making the entire cluster use the same secret and UUID...

dianabarsan commented 2 years ago

This is ready for AT on 7800-no-couch-secret.

Please make sure that:

Compose files:

tatilepizs commented 2 years ago

Thanks @dianabarsan for the steps and the files to test.

Here are the testing results using the files provided in the previous comment and the branch 7800-no-couch-secret

Using single node CouchDB

Video attached [video](https://user-images.githubusercontent.com/94494491/193871828-33164a8c-7e5c-401f-a005-fcac16181081.mov)
Online user ![image](https://user-images.githubusercontent.com/94494491/193869886-204f02f3-7e17-48a5-9045-9263bdd467c6.png)
Offline user ![image](https://user-images.githubusercontent.com/94494491/193869943-15ff8e04-7f55-4438-8a54-351a8581eaca.png)
Video attached [video](https://user-images.githubusercontent.com/94494491/193872369-e53b36a0-5d04-4c9f-b9a8-5e89648bff2e.mov)

Using clustered CouchDB

Error attached ``` cht-api | RequestError: Error: getaddrinfo ENOTFOUND haproxy cht-api | at new RequestError (/api/node_modules/request-promise-core/lib/errors.js:14:15) cht-api | at Request.plumbing.callback (/api/node_modules/request-promise-core/lib/plumbing.js:87:29) cht-api | at Request.RP$callback [as _callback] (/api/node_modules/request-promise-core/lib/plumbing.js:46:31) cht-api | at self.callback (/api/node_modules/request/request.js:185:22) cht-api | at Request.emit (node:events:527:28) cht-api | at Request.onRequestError (/api/node_modules/request/request.js:877:8) cht-api | at ClientRequest.emit (node:events:527:28) cht-api | at Socket.socketErrorListener (node:_http_client:454:9) cht-api | at Socket.emit (node:events:527:28) cht-api | at emitErrorNT (node:internal/streams/destroy:157:8) { cht-api | cause: Error: getaddrinfo ENOTFOUND haproxy cht-api | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) { cht-api | errno: -3008, cht-api | code: 'ENOTFOUND', cht-api | syscall: 'getaddrinfo', cht-api | hostname: 'haproxy' cht-api | }, cht-api | error: Error: getaddrinfo ENOTFOUND haproxy cht-api | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) { cht-api | errno: -3008, cht-api | code: 'ENOTFOUND', cht-api | syscall: 'getaddrinfo', cht-api | hostname: 'haproxy' cht-api | } cht-api | } cht-sentinel | RequestError: Error: getaddrinfo ENOTFOUND haproxy cht-sentinel | at new RequestError (/sentinel/node_modules/request-promise-core/lib/errors.js:14:15) cht-sentinel | at Request.plumbing.callback (/sentinel/node_modules/request-promise-core/lib/plumbing.js:87:29) cht-sentinel | at Request.RP$callback [as _callback] (/sentinel/node_modules/request-promise-core/lib/plumbing.js:46:31) cht-sentinel | at self.callback (/sentinel/node_modules/request/request.js:185:22) cht-sentinel | at Request.emit (node:events:527:28) cht-sentinel | at Request.onRequestError (/sentinel/node_modules/request/request.js:877:8) cht-sentinel | at ClientRequest.emit (node:events:527:28) cht-sentinel | at Socket.socketErrorListener (node:_http_client:454:9) cht-sentinel | at Socket.emit (node:events:527:28) cht-sentinel | at emitErrorNT (node:internal/streams/destroy:157:8) { cht-sentinel | cause: Error: getaddrinfo ENOTFOUND haproxy cht-sentinel | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) { cht-sentinel | errno: -3008, cht-sentinel | code: 'ENOTFOUND', cht-sentinel | syscall: 'getaddrinfo', cht-sentinel | hostname: 'haproxy' cht-sentinel | }, cht-sentinel | error: Error: getaddrinfo ENOTFOUND haproxy cht-sentinel | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26) { cht-sentinel | errno: -3008, cht-sentinel | code: 'ENOTFOUND', cht-sentinel | syscall: 'getaddrinfo', cht-sentinel | hostname: 'haproxy' cht-sentinel | } cht-sentinel | } ```
dianabarsan commented 2 years ago

On the error, it looks like the haproxy container failed to come up. Can you check the logs? It can be something as simple as a port clash or something else.

dianabarsan commented 2 years ago

I had problems when I tried to log in using an offline user. Not sure if I am missing something

From the video, it looks like your browser doesn't accept the self signed certificate and doesn't download the service worker, which is required for offline users. How do you usually handle self signed certificates?

tatilepizs commented 2 years ago

About the certificate problems, I have never had this issue before. I was reading about it, so I tried using Firefox, exported the certificates and added them to the keychanin access to be trusted, but that did not work for chrome, don't understand why because it is working fine in Firefox, will need to investigate a little bit more, but meanwhile, I was testing that the offline users didn't download all docs again when I restarted CouchDb container.

Video attached [video](https://user-images.githubusercontent.com/94494491/193918438-c2a82d78-1acd-4a5f-b021-cd98ddc4e9d1.mov)
dianabarsan commented 2 years ago

offline users didn't download

Since your user only has 37 docs, then you would not notice them downloading in a sync unless you inspected the network requests, and check how many docs the server sends back.

tatilepizs commented 2 years ago

About the error when I try to use the clustered CouchDB..

This is the error that it is showing the `cht-haproxy` ``` backend couchdb-servers balance leastconn retry-on all-retryable-errors log global retries 5 # servers are added at runtime, in entrypoint.sh, based on couchdb server couchdb couchdb:5984 check agent-check agent-inter 5s agent-addr healthcheck agent-port 5555 [alert] 276/204913 (1) : parseBasic loaded [alert] 276/204913 (1) : parseCookie loaded [alert] 276/204913 (1) : replacePassword loaded [NOTICE] 276/204913 (1) : haproxy version is 2.3.19-0647791 [NOTICE] 276/204913 (1) : path to executable is /usr/local/sbin/haproxy [ALERT] 276/204913 (1) : parsing [/usr/local/etc/haproxy/backend.cfg:7] : 'server couchdb' : could not resolve address 'couchdb'. [ALERT] 276/204913 (1) : Failed to initialize server(s) addr. ```

I don't have a lot of knowledge with docker so I just try changing the name of the COUCHDB_SERVERS in the docker-compose_cht-core.yml from couchdb to couchdb.1/couchdb.2/couchdb.3 just to see what happened.

Using couchdb.1 the result was that the container that failed this time was the cht-api with the error:

Error ``` 2022-10-04 20:55:13 INFO: Translations loaded successfully 2022-10-04 20:55:14 INFO: Running installation checks… 2022-10-04 20:55:14 INFO: Medic API listening on port 5988 2022-10-04 20:55:14 ERROR: Fatal error initialising medic-api 2022-10-04 20:55:14 ERROR: FetchError: invalid json response body at http://haproxy:5984/medic/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 reason: Unexpected token < in JSON at position 0 at /api/node_modules/node-fetch/lib/index.js:272:32 at processTicksAndRejections (node:internal/process/task_queues:96:5) { message: 'invalid json response body at http://haproxy:5984/medic/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 reason: Unexpected token < in JSON at position 0', type: 'invalid-json', [stack]: 'FetchError: invalid json response body at http://haproxy:5984/medic/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 reason: Unexpected token < in JSON at position 0\n' + ' at /api/node_modules/node-fetch/lib/index.js:272:32\n' + ' at processTicksAndRejections (node:internal/process/task_queues:96:5)', name: 'FetchError' } ```

Using couchdb.2 or couchdb.3 all the containers were up successfully but I am seeing this error:

Error ```cht-haproxy | <150>Oct 4 21:36:59 haproxy[27]: 172.21.0.7,,503,0,0,0,GET,/,-,admin,'-',222,-1,-,'-' cht-sentinel | StatusCodeError: 503 - "

503 Service Unavailable

\nNo server is available to handle this request.\n\n" cht-sentinel | at new StatusCodeError (/sentinel/node_modules/request-promise-core/lib/errors.js:32:15) cht-sentinel | at Request.plumbing.callback (/sentinel/node_modules/request-promise-core/lib/plumbing.js:104:33) cht-sentinel | at Request.RP$callback [as _callback] (/sentinel/node_modules/request-promise-core/lib/plumbing.js:46:31) cht-sentinel | at Request.self.callback (/sentinel/node_modules/request/request.js:185:22) cht-sentinel | at Request.emit (node:events:527:28) cht-sentinel | at Request. (/sentinel/node_modules/request/request.js:1154:10) cht-sentinel | at Request.emit (node:events:527:28) cht-sentinel | at IncomingMessage. (/sentinel/node_modules/request/request.js:1076:12) cht-sentinel | at Object.onceWrapper (node:events:641:28) cht-sentinel | at IncomingMessage.emit (node:events:539:35) { cht-sentinel | statusCode: 503, cht-sentinel | error: '

503 Service Unavailable

\n' + cht-sentinel | 'No server is available to handle this request.\n' + cht-sentinel | '\n' cht-sentinel | } cht-haproxy | <150>Oct 4 21:37:00 haproxy[27]: 172.21.0.8,,503,0,1,0,GET,/,-,admin,'-',222,-1,-,'-' cht-api | StatusCodeError: 503 - "

503 Service Unavailable

\nNo server is available to handle this request.\n\n" cht-api | at new StatusCodeError (/api/node_modules/request-promise-core/lib/errors.js:32:15) cht-api | at Request.plumbing.callback (/api/node_modules/request-promise-core/lib/plumbing.js:104:33) cht-api | at Request.RP$callback [as _callback] (/api/node_modules/request-promise-core/lib/plumbing.js:46:31) cht-api | at Request.self.callback (/api/node_modules/request/request.js:185:22) cht-api | at Request.emit (node:events:527:28) cht-api | at Request. (/api/node_modules/request/request.js:1154:10) cht-api | at Request.emit (node:events:527:28) cht-api | at IncomingMessage. (/api/node_modules/request/request.js:1076:12) cht-api | at Object.onceWrapper (node:events:641:28) cht-api | at IncomingMessage.emit (node:events:539:35) { cht-api | statusCode: 503, cht-api | error: '

503 Service Unavailable

\n' + cht-api | 'No server is available to handle this request.\n' + cht-api | '\n' cht-api | } ```

Not sure if this helps you or not, I just wanted to try different things 🙂

tatilepizs commented 2 years ago

...you would not notice them downloading in a sync unless you inspected the network requests, and check how many docs the server sends back.

Thanks for pointing that @dianabarsan I think this video is better, isn't it?

Video [video](https://user-images.githubusercontent.com/94494491/193939209-d0282268-367f-4ae0-9067-8bca8cfcb609.mov)
dianabarsan commented 2 years ago

Unfortunately no :( Pouch <-> Couch replication is optimized to not download a document if it already exists locally (and this check is made via the _revs_diff) call. In your case, you should inspect the response of the changes requests after you restart the container (the one that doesn't fail). There should be no changes there at all (or 1-2 docs that were updated in the meantime). Another option is to check that the since parameter is never rolled back, so you would look at every /medic/_changes request and check the since parameter, which should never go back to 0.

When checking, please be aware that there will be a _changes request for the users meta database. Checking that can also be used to verify, but then please be sure you manually sync once before restarting the container - the meta database doesn't automatically sync on startup.

mrjones-plip commented 2 years ago

I don't have a lot of knowledge with docker so I just try changing the name of the COUCHDB_SERVERS in the docker-compose_cht-core.yml from couchdb to couchdb.1/couchdb.2/couchdb.3 just to see what happened.

Looking into how core-eng/sre architected this, the readme specifies you were right! You do need to set them. But seperate them with ,, not / ;) (note, readme is wrong! We want to use a single COUCHDB_SERVERS, not discrete COUCHDB1_SERVER etc - I'll open another PR to fix this tomorrow)

I was able to use these steps to test with clustered couch on this branch:

  1. Download cht-core and clustered couch from this branch
  2. call compose up with: COUCHDB_SERVERS="couchdb.1,couchdb.2,couchdb.3" COUCHDB_PASSWORD=password COUCHDB_USER=medic docker-compose -f docker-compose_cht-couchdb-clustered.yml -f docker-compose_cht-core.yml up
tatilepizs commented 2 years ago

Thank you @dianabarsan and @mrjones-plip for your help.

I think that I have tested everything correctly this time, here are the results:

Using single node CouchDB

Video attached [video](https://user-images.githubusercontent.com/94494491/194368905-62796ee3-a28b-4898-812e-1ef94f7b67be.mov)

Using clustered CouchDB

Video attached [video](https://user-images.githubusercontent.com/94494491/194369600-280e8bec-914c-466b-98b4-9839bdbb883c.mov)

@dianabarsan please let me know is there is anything else that I am missing and should test, and thanks again, I learned a lot from this ticket.

dianabarsan commented 2 years ago

Excellent testing, thank you so much @tatilepizs !

dianabarsan commented 2 years ago

Merged to master