Week 04 routine - Githubissues

kiwixbot commented 5 months ago

Infra

[x] Check nodes free space
```
df -h / && df -h /data
```
[x] Nodes and worker system upgrades
```
apt update && apt upgrade
```
[x] Ensure all borg repositories are being updated
[x] Check Pod errors
```
k get pods -A -o wide|grep Error
```

[x] Check Pod restarts

k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'

[x] Check if k8s should/could be upgraded

curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"

[x] Upgrade k8s if applicable

Stats

[x] Ensure download.kiwix.org stats are being recorded
[x] Check whether matomo should be upgraded

Grafana

[x] Alert list is normal
[x] Zimfarm dashboard is normal
[x] Mirrorbrain dashboard is normal
[x] There is no abnormal behaviors on cluster resources consumption

Projects

[x] UptimeRobot has no alert
[x] youzim.it backlog is reasonable
[x] No systematic failure in tasks.
[x] PRs awaiting your review

Security

[x] Analyze/merge dependabot PRs

Note: this is an automatic reminder intended for the assignee(s).

benoit74 commented 5 months ago

Filesystem	Size	Used	Avail	Use%	Use change
bastion	37G	8.3G	27G	24%	+0.1G ; -
stats	233G	96G	126G	44%	- ; -
services	456G	219G	214G	61%	+5G ; -
storage	33T	20T	12T	62%	+1T ; -

@rgaudin do you have any idea why we consume more space on services node? 5G this week, 3G last week, these are not small things.

Updated scaleway-ecosystem (0.0.6-10~debian11) over (0.0.6-9~debian11) on bastion

zimit

Nota: Multiple ocurence on the same website domain are counted only once

1x Ziming something on localhost 4x Python 403 (minecraft.wiki + reddit.com + miakodacraft.pixieset.com + www.repairclinic.com) 2x Connection refused (www.htlmistelbach.ac.at + facebook.com) 1x 90000ms Timeout (en.wikipedia.org) 2x MetaData Title: must NOT have more than 30 characters (See https://github.com/openzim/zimfarm/issues/906) 1x CERTIFICATE_VERIFY_FAILED (www.healingcancernaturally.com) 1x Seed page load error (weird t.co tiny URL redirecting to a Chinese website)

rgaudin commented 5 months ago

@rgaudin do you have any idea why we consume more space on services node? 5G this week, 3G last week, these are not small things.

No. I looked at the disk briefly and notice there's a bunch of maintenance leftovers (mostly in-maintenance copies of DBs). I haven't removed it yet but we should probably do it (and record here the freed space so we can continue to track increase).

I see that pg databases are quite large (feeling from what it's being used for) but most importantly, OCI images layers are huge. 111GB currently for a limited number of services.

Prior to k8s, we'd prune using docker but AFAIK we haven't setup any pruning mechanism on the cluster nor on the node. I think k8s has a garbage collector bundled but it triggers only at 90% of used disk space. I'd be happy to discuss with you about what's possible regarding this

kiwix / operations