hotosm / tasking-manager

Tasking Manager - The tool to team up for mapping in OpenStreetMap
https://wiki.openstreetmap.org/wiki/Tasking_Manager
BSD 2-Clause "Simplified" License
509 stars 275 forks source link

fix: db container healthchecks that was earlier broken #6510

Closed mahesh-naxa closed 2 months ago

mahesh-naxa commented 4 months ago

What type of PR is this? (check all applicable)

Issue

Many users were experiencing failed tm-migration container that resulted in tm-backend service left in Created state and never to Started.

This is mainly caused due to healthcheck in tm-db being too light, and only checks if the postgresql server is started or not.

This was one of the reason deployment attempts by Bash from Technology without borders was not successful. Errors he received:

tasking-manager-main-tm-migration-1  |     return self.loaded_dbapi.connect(*cargs, **cparams)
tasking-manager-main-tm-migration-1  |   File "/home/appuser/.local/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
tasking-manager-main-tm-migration-1  |     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
tasking-manager-main-tm-migration-1  | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "tm-db" (172.31.0.5), port 5432 failed: Connection refused
tasking-manager-main-tm-migration-1  |  Is the server running on that host and accepting TCP/IP connections?
This second error also occurs after running the 'docker-compose up --detach' command:

    service "tm-migration" didn't complete successfully: exit 1

Describe this PR

I have updated healthcheck defination from

pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}

to

psql -h 0.0.0.0 -U ${POSTGRES_USER:-tm} -d ${POSTGRES_DB:-tasking-manager} -c 'SELECT 1;'

This verifies that the defined database & database role is created, and also performs a SELECT 1 that validates the initialization of the tm-db service.

Also to mention pg_isready doesn't do that, you could have pg_isready -U this_user_doesnt_exist -d this_db_doesnt_exist and it would still exit 0, which is unwanted.

Also, -h 0.0.0.0 verifies reachibility from other services, Ensuring the db now can be finally consumed.

Review Guide

Notes for the reviewer. How to test this change?

Checklist before requesting a review

[optional] What gif best describes this PR or how it makes you feel?

sonarcloud[bot] commented 4 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

mahesh-naxa commented 4 months ago

More discussion here: https://github.com/hotosm/tasking-manager/pull/6488#issuecomment-2227696339

dakotabenjamin commented 2 months ago

@mahesh-naxa can you fix conflicts on this PR? Its ready to merge otherwise

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

mahesh-naxa commented 2 months ago

@dakotabenjamin i have updated the branch.