khuedoan / homelab

Fully automated homelab from empty disk to running services with a single command.
https://homelab.khuedoan.com
GNU General Public License v3.0
8.1k stars 742 forks source link

gitea and 4 other services in a degraded state #132

Closed sushyad closed 8 months ago

sushyad commented 8 months ago

Fresh new prod install with 4 optiplex 7040 micros, 32gb ram each, but gitea, dex, blog, woodpecker, and renovate are in a degraded state. Here is the error from gitea:

container "gitea" in pod "gitea-67cfcd4b5d-vrdlj" is waiting to start: PodInitializing 
==== BEGIN GITEA CONFIGURATION ====
2024/01/23 04:22:30 .../setting/security.go:168:loadSecurityFrom() [W] Enabling Query API Auth tokens is not recommended. DISABLE_QUERY_AUTH_TOKEN will default to true in gitea 1.23 and will be removed in gitea 1.24.
2024/01/23 04:22:30 cmd/migrate.go:33:runMigrate() [I] AppPath: /usr/local/bin/gitea
2024/01/23 04:22:30 cmd/migrate.go:34:runMigrate() [I] AppWorkPath: /data
2024/01/23 04:22:30 cmd/migrate.go:35:runMigrate() [I] Custom path: /data/gitea
2024/01/23 04:22:30 cmd/migrate.go:36:runMigrate() [I] Log path: /data/log
2024/01/23 04:22:30 cmd/migrate.go:37:runMigrate() [I] Configuration file: /data/gitea/conf/app.ini
2024/01/23 04:22:30 ...2@v2.25.7/command.go:267:Run() [I] PING DATABASE postgres
2024/01/23 04:22:30 cmd/migrate.go:40:runMigrate() [F] Failed to initialize ORM engine: dial tcp: lookup gitea-postgresql-ha-pgpool.gitea.svc.cluster.local: no such host
Gitea migrate might fail due to database connection...This init-container will try again in a few seconds
khuedoan commented 8 months ago

Woodpecker, Dex and Renovate depend on Gitea, so I think we can fix Gitea first. Is Gitea's PosgreSQL pod running?

The blog is just my personal blog so you can remove that (it depends on Gitea, Woodpecker and Docker registry to build)

sushyad commented 8 months ago

i did remove the blog with same results. i don't see any problem with gitea posgresql either, it seems to be running fine. could any of this be linked to dns issues? before i spun up with deployment, i was using the same domain for my previous setup with some dns entries clashing, but i have cleaned that up since then.

registry was fine 10 hrs ago but is degraded now with the garbage collector pod red.

khuedoan commented 8 months ago

Can you try nslookup gitea-postgresql-ha-pgpool.gitea.svc.cluster.local in the Gitea pod, here's mine:

Server:     10.43.0.10
Address:    10.43.0.10:53

Name:   gitea-postgresql-ha-pgpool.gitea.svc.cluster.local
Address: 10.43.204.48

and cat /etc/resolv.conf

search gitea.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5
sushyad commented 8 months ago

so i could not do nslookup on the pod since it was not even up, but i got around the problem by updating the pod dnsPolicy in argocd:

  dnsConfig:
    nameservers:
      - 10.43.0.10
    options:
      - name: ndots
        value: '5'
    searches:
      - svc.cluster.local
      - cluster.local
  dnsPolicy: None

for some reason the default setting of dnsPolicy: ClusterFirst was not working. while this update fixed the deployment and it became healthy, the pod still cannot resolve the github repos.

i did some troubleshooting of the coredns pod and it was not able to resolve the gitea-postgresql-ha-pgpool.gitea.svc.cluster.local address but it did resolve gitea-postgresql-ha-pgpool.gitea.svc and gitea-postgresql-ha-pgpool.gitea just fine.

i am not entirely sure what is going on with the dns setting on this particular deployment only since all the others are just fine. maybe my pfsense is messing with some dns resolution.

sushyad commented 8 months ago

i left my old domain alone and ended up getting a brand new domain and starting from scratch, which fixed whatever issues i was having. always better to have a clean slate :)