medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
438 stars 209 forks source link

Docker helper failing to run cht on later versions #9442

Closed 1yuv closed 1 week ago

1yuv commented 1 week ago

Describe the bug We are not able to run docker-helper-4x on latest version of CHT as it fails to pull required images.

To Reproduce

  1. Run docker helper.
  2. When prompted if you want to start new project, say yes.
  3. When prompted if you want to run latest version of CHT Core (4.10.0), say yes.
  4. Give your project a name.
  5. Wait for your project to fail.

Expected behavior Docker helper should successfully start the instance.

Logs

If you inspect the upgrade service logs, you would see that docker helper is unable to pull required manifests.

This is the log I got from trying to start 4.10.0 instance:

sentinel Pulled 
couchdb Error 
nginx Error 
api Error 
haproxy Error

Error response from daemon: Head "https://public.ecr.aws/v2/medic/cht-couchdb/manifests/4.10.0": proxyconnect tcp: dial tcp 192.168.65.1:3128: i/o timeout

    at ChildProcess.<anonymous> (/app/src/docker-compose-cli.js:35:25)
    at ChildProcess.emit (node:events:513:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:293:12)

This is the log I got from trying to start latest master:

Error response from daemon: Head "https://public.ecr.aws/v2/medic/cht-sentinel/manifests/4.11.0-alpha": Get "https://public.ecr.aws/token/?scope=repository%3Amedic%2Fcht-sentinel%3Apull&service=public.ecr.aws": context deadline exceeded

    at ChildProcess.<anonymous> (/app/src/docker-compose-cli.js:35:25)
    at ChildProcess.emit (node:events:513:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:293:12)

Screenshots

Environment

Additional context This works on older version of the CHT (4.9.0), so I tested that and suggested a forum user to work with 4.9.0

I suspect either a) we are not publishing image or manifests correctly, b) the permission has changed

m5r commented 1 week ago

I haven't been able to reproduce on linux (EndeavourOS), it seems to be specific to MacOS. What Docker and Docker Compose versions are you running?

mrjones-plip commented 1 week ago

I confirm on Ubuntu 22.04 running CHT 4.10.0 works as expected on these versions of docker:

docker --version && docker compose version
Docker version 27.2.1, build 9e34c9b
Docker Compose version v2.23.0-desktop.1

@1yuv - did you try and upgrade an instance? There is a known issue with docker helper and upgrading. You have great steps to reproduce I didn't read closely enough. You clearly are on a new install!

mrjones-plip commented 1 week ago

@1yuv - looking at this a bit more closely, i see two errors:

One is in CouchDB (i/o timeout) on 4.10.0 and one is in Sentinel (context deadline exceeded) on ~master. This seems odd. I would have expected them to fail in the same way.

Given it works for two other engineers, I wonder if this is an intermittent networking issue? Do you know if you had the 4.9.0 images cached already?

Can you try deleting the any cached 4.9.0 images and see if the issue resurfaces there? This should work (but will kill all containers!): docker kill $(docker ps -q)&&docker image rm -f $(docker image ls -qf "reference=public.ecr.aws/medic/*:4.9.0")

alternately you may try pulling all the images first for 4.10.0 and then trying to run docker helper to see if that fixes it or finds network based errors:

docker pull public.ecr.aws/medic/cht-haproxy:4.10.0
docker pull public.ecr.aws/medic/cht-haproxy-healthcheck:4.10.0
docker pull public.ecr.aws/medic/cht-api:4.10.0
docker pull public.ecr.aws/medic/cht-sentinel:4.10.0
docker pull public.ecr.aws/medic/cht-nginx:4.10.0
docker pull public.ecr.aws/medic/cht-couchdb:4.10.0
1yuv commented 1 week ago

Hi @mrjones-plip , I checked again and tried to create instance from 4.10.0, and this succeeded. This looks like an intermittent networking issue at that time.

mrjones-plip commented 1 week ago

Great - thanks for confirming!