ops issues - Githubissues

stakwork / sphinx-swarm

lightning container orchestration for massive deployments

4 stars 4 forks source link

ops issues #290

Open Evanfeenstra opened 2 months ago

Evanfeenstra commented 2 months ago

crashing ec2

limit memory per container, and total docker limit?
rate limiting in traefik
logs outside sometimes - cloudwatch. also make sure to have good logs
log rotation if its local

ip addresses changing

static IPs on lightning nodes ($3/month)
could still have a load balancer (for domains) Forward to traefik

better logs in swarm UI

Evanfeenstra commented 2 months ago

superadmin

creating swarms
restarting EC2
update route53 stuff

tomsmith8 commented 2 months ago

@Evanfeenstra could you prioritise setting docker and container limits.

@kevkevinpal could you prioritise migrating the btc graph, updating the github actions pipeline and deprecating the non swarm ec2 instances

Next up then would be setting up cloud watch?

Evanfeenstra commented 2 months ago

just merged a per container memory limit, set it once and it applies to every container

https://github.com/stakwork/sphinx-swarm/commit/84ab2259b96e11dfc8e899639b518e53d012489c

Its global_mem_limit in the yaml config file, its a number in bytes

Evanfeenstra commented 2 months ago

@tobi-bams here's a new SetGlobalMemLimit cmd, maybe u can add a frontend for it? https://github.com/stakwork/sphinx-swarm/blob/master/src/cmd.rs#L152

tobi-bams commented 2 months ago

@tobi-bams here's a new SetGlobalMemLimit cmd, maybe u can add a frontend for it? https://github.com/stakwork/sphinx-swarm/blob/master/src/cmd.rs#L152

Yea, sure I can.

Evanfeenstra commented 2 months ago

log rotation: https://github.com/stakwork/sphinx-swarm/releases/tag/v0.4.98

tomsmith8 commented 2 months ago

Update all swarms to m5.large or higher.

Do not use t groups due to CPU credits and spikes causes machines to become unavailable.

tomsmith8 commented 2 months ago

@Evanfeenstra any updates on keeping logs?

not deleting and keeping locally
future -> stream logs to something like cloudwatch