stakwork / sphinx-swarm

lightning container orchestration for massive deployments
4 stars 4 forks source link

NavFiber - Restart on failure #196

Open tomsmith8 opened 5 months ago

tomsmith8 commented 5 months ago

Task

On swarm, Nav-Fiber (and all others) should auto-restart on failure.

Evanfeenstra commented 5 months ago

all images have restat: on-failure policy already

I really think its related to the docker volume filling up with old stale images https://github.com/stakwork/sphinx-swarm/issues/197

tomsmith8 commented 5 months ago

Nav-fiber went down on Swarm20 but hasn't spun back up:

Image

Evanfeenstra commented 5 months ago

The error is

ERROR [sphinx_swarm::builder] FAILED TO UPDATE NODE Docker responded with status code 500: Head "https://registry-1.docker.io/v2/sphinxlightning/sphinx-nav-fiber/manifests/latest": error parsing HTTP 429 response body: invalid character 'S' looking for beginning of value: "Server capacity exceeded

tomsmith8 commented 5 months ago

Updated auto-updater to daily and currently monitoring.

Seems like issue may still exist but lets wait and see