kashalls / external-dns-unifi-webhook

External-DNS Webhook to manage UniFi DNS Records
Apache License 2.0
93 stars 5 forks source link

Intermittent CrashLoopBackOffs #20

Closed jimmy-ungerman closed 3 months ago

jimmy-ungerman commented 3 months ago

Every 2 hours or so, the webhook container becomes unhealthy and crashes with the following logs.

{"error":"json: cannot unmarshal object into Go value of type []unifi.DNSRecord","level":"error","msg":"error getting records","requestMethod":"GET","requestPath":"/records","time":"2024-05-25T04:35:18Z"}

After some time in the CrashLoopBackoff state, it fixes itself and redeploys in a healthy state, until the same crash happens 2 hours later.

jimmy-ungerman commented 3 months ago

@kashalls has brought up that cookie expires every 2 hours which definitely could be causing this, but it seems somewhat intermittent. My latest deployment was at 1:33PM and I didn't get my first alert until 6:38PM. After that first alert, it has happened every 2 hours

kashalls commented 3 months ago

I believe that the token refresh is not happening in the background, we would probably need to call the Login method after the cookie expires (which lasts for 2 hours from the time of iat).

doonga commented 3 months ago

I can confirm that I'm getting this after roughly two hours, then it clears up and happens again some time later. I cleared the alert notifications, I'll try to note the times next time it happens.

onedr0p commented 3 months ago

I added more debugging lines in https://github.com/kashalls/external-dns-unifi-webhook/pull/21

We will get some more info if it happens again.

doonga commented 3 months ago

Can confirm that the webhook starts erroring out at 2 hours, then the external-dns container starts getting 500s from the web hook and crashloops.

onedr0p commented 3 months ago

I might have something in https://github.com/kashalls/external-dns-unifi-webhook/pull/23

kashalls commented 3 months ago

Should be fixed in #23, wait for container build and try pls thanks.

onedr0p commented 3 months ago

This can be closed