nightscout / cgm-remote-monitor

nightscout web monitor
GNU Affero General Public License v3.0
2.35k stars 71.33k forks source link

"no healthy upstream" on NorthFlank #8232

Closed robster7674 closed 3 months ago

robster7674 commented 4 months ago

If you need support for Nightscout, PLEASE DO NOT FILE A TICKET HERE For support, please post a question to the "CGM in The Cloud" group in Facebook (https://www.facebook.com/groups/cgminthecloud) or visit the WeAreNotWaiting Discord at https://discord.gg/zg7CvCQ

Describe the bug A clear and concise description of what the bug is. A few minutes ago I got an error on my NightScout url: "no healthy upstream".

To Reproduce Steps to reproduce the behavior: Visit my NightScout url.

Expected behavior See my blood sugar.

Screenshots image

Your setup information

Just filing this here for reference, in case other people on NorthFlank hit this too.

robster7674 commented 3 months ago

Same issue again just now. Only now also Safari does not work anymore. Anyone else seeing this, on NorthFlank?

robster7674 commented 3 months ago

Looks like an issue with Mongo, just now got "Unable to connect to Mongo". Looking into, but I wonder why I did not get this message earlier.

robster7674 commented 3 months ago

For some reason, after doing a restart of the rollout, the connection suddenly started working again:

2024-03-10T13:20:06.222187406Z stdout F Successfully established connection to MongoDB
2024-03-10T13:19:05.938507986Z stdout F Error connecting to MongoDB: {"reason":{"type":"ReplicaSetNoPrimary","servers":{},"stale":false,"compatible":true,"heartbeatFrequencyMS":10000,"localThresholdMS":15,"setName":"rs0","maxElectionId":null,"maxSetVersion":null,"commonWireVersion":0,"logicalSessionTimeoutMinutes":null}} - retrying in 60 sec
2024-03-10T13:17:35.91513115Z stdout F Error connecting to MongoDB: {"reason":{"type":"ReplicaSetNoPrimary","servers":{},"stale":false,"compatible":true,"heartbeatFrequencyMS":10000,"localThresholdMS":15,"setName":"rs0","maxElectionId":null,"maxSetVersion":null,"commonWireVersion":0,"logicalSessionTimeoutMinutes":null}} - retrying in 60 sec
...

No idea yet as to what the cause might be. Also see https://github.com/nightscout/cgm-remote-monitor/issues/6775.

robster7674 commented 3 months ago

Discussing with NorthFlank's (very kind and helpful) support, this seems to have been caused by an unknown (race) condition which caused two instances to be up at the same time, where one of the two was receiving traffic while in a broken state. Theory for now that this was due to a (temporarily) overloaded server in the free tier - although support confirmed that this "should" not have happened. Closing for now.

robster7674 commented 1 month ago

The same issue happened again, and reaching out to support again learned that these two environment variables were missing:

HOSTNAME=0.0.0.0 INSECURE_USE_HTTP=true

The missing variables caused the health probe to fail.