Closed mrjones-plip closed 1 year ago
Removing cht-net
from the networks
property should fix this issue.
Only the overlay network is required when running the CHT over a distributed cluster. It should be:
networks:
- cht-overlay
This should be the case for all other services.
Yes! Thanks for the confirmation. I discovered this in my own testing as well .
Thinking on this some more, @henokgetachew - what about declaring cht-net
as the overlay before we launch any services? We'll have to change the docker network create
instructions to use the new name, but that's easy.
Then the instructions for editing the compose files can be much more simple, something like:
Find the
networks:
section at the very bottom of the compose file and add two lines so it looks like this:networks: cht-net: name: ${CHT_NETWORK:-cht-net} driver: overlay external: true
While we should still release stand alone multi-node compose files that are ready for production, fewer manual edits is better I think!
latest docs PR fixes this - which is now live here. As there was no inherent code bug, removed the labels for "affects" for all 4.x versions.
Describe the bug Often, when I start my multi-node clustered CHT 4.x instance, I get errors in my API container and nginx containers and the CHT fails to start
To Reproduce
docker compose up -d
on the CHT Core nodeExpected behavior The CHT shows a login page and loads correctly
Logs
The API container has this error repeatedly:
This, in turn, causes the the webserver (nginx) to fail to talk to API, so the browser to my instance gives a
502
error:The nginx container of course has errors too, because API can't talk to HA Proxy:
Screenshots
Environment
Additional context
When testing this, everything may work on the first try. If that's the case try and run this 5 or 6 times on the CHT core node and check in a browser if it succeeds each time:
The semi-functional work around is to aggressively restart containers, with out changing anything and then some how it starts working again.
There was some speculation (private slack thread) that the fix to this was:
couchdb.1
->couchdb-1.local
container_name: couchdb-1.local
This is not the case. I still get high occurrences of this error when I set my CouchDB servers in my CHT Core
.env
file to have the correct name:COUCHDB_SERVERS=couchdb-1.local,couchdb-2.local,couchdb-3.local
and then set each of CouchDB compose file look like this node 1 in this example (below).