nightscout / cgm-remote-monitor

nightscout web monitor
GNU Affero General Public License v3.0
2.36k stars 71.43k forks source link

NS 15.0.1 Master causes 25 seconds response time on Heroku/MongoDB...! #8125

Open oddst opened 9 months ago

oddst commented 9 months ago

If you need support for Nightscout, PLEASE DO NOT FILE A TICKET HERE For support, please post a question to the "CGM in The Cloud" group in Facebook (https://www.facebook.com/groups/cgminthecloud) or visit the WeAreNotWaiting Discord at https://discord.gg/zg7CvCQ

Describe the bug After an upgrade to NS 15.0.1 (Master), the Nightscout site has 25 seconds respons time (contineously) according to Heroku measurements...! The site seems sluggish an slow. To Reproduce Steps to reproduce the behavior:

  1. Log into Heroku, go to app, go to Metrics pane, look at the response time graph

Expected behavior I have two Nightscout sites with Heroku/MongoDB. One in the US and one in Ireland. One have had NS15 for som days now doing uploading troubleshooting. The other site (Europe) was updated to Nightscout Master 15.0.1 today. Heroku has 24 hours statistics, so I cannot see when the first site got this behavior, but the second site got that behavior after upgrading to Nightscout 15.0.1 Master (from an NS15 Dev version from May 2023)...!!!

Screenshots See screenshots showing first the US site, and then the Europe site.

Your setup information

Additional context

Skjermbilde 2023-10-21 kl  19 04 32 Skjermbilde 2023-10-21 kl  19 04 41 Skjermbilde 2023-10-21 kl  19 40 42 Skjermbilde 2023-10-21 kl  19 40 51

Add any other context about the problem here.

bewest commented 9 months ago

Thanks for the details. Can you check your mongodb to ensure it is working as expected as well? If you are using Atlas, the free tier can cause slowness due to hitting quota issues. There was some work in May to reduce load on mongo and update the mongo drivers that was in 15.0.0 but not in 15.0.1.

oddst commented 9 months ago

Hi Ben,

I have added some pictures that shows the situation in MongoDB. Nothing to worry about there…!

With regard to the Heroku subscription, both subscriptions are of the Basic type (and not the Eco type). That restricts the subscription to 1 DynOS, but that can run 60x24x30 per month (all the time). The USD 7.- pr. Month, is a financial limit and not a technical one.

It is difficult for me to check the response time of the transactions, but I will have to say that many of them take «less» than 26 seconds to complete. And Heroku’s measurement indicates that this is ongoing almost 99.9% of the time…! Typically that would have made me not trust Heroku’s response time measurements, except that I can see this behavior start when I upgraded to NS15.0.1 on the Norwegian (Ireland) subscription earlier today…! That is difficult to ignore…

This is the MongoDB statistics since the 1st of October for the US account…!

Skjermbilde 2023-10-22 kl  04 29 57 Skjermbilde 2023-10-22 kl  04 23 11 Skjermbilde 2023-10-22 kl  04 22 53 Skjermbilde 2023-10-22 kl  04 48 47

This is the info for the NS site in Norway:

Skjermbilde 2023-10-22 kl  04 49 50 Skjermbilde 2023-10-22 kl  04 47 35 Skjermbilde 2023-10-22 kl  04 46 06 Skjermbilde 2023-10-22 kl  04 45 13 Skjermbilde 2023-10-22 kl  04 45 05

There are no performance throttle, and not too much stored data.

I cannot see anything that should result in continuously increased response time…!

BR Odd

P.S. Had to upload the picture directly in GItHub, as replying by e-mail did not deliver the pictures...

oddst commented 9 months ago

Today, I received the following reply for the US site from Heroku Support:

Skjermbilde 2023-10-23 kl  17 21 43
oddst commented 9 months ago

I also noticed that the European site (Ireland) do not have this response time problem anymore:

Skjermbilde 2023-10-23 kl  17 15 40

This European site is still using BRIDGE to get data from Dexcom Share. The US site is now using CONNECT to do similarly...!

oddst commented 9 months ago

The US site, still looks like this:

Skjermbilde 2023-10-23 kl  17 17 05

It should be mentioned that only the US site is running a full iAPS monitoring, but the European site is only receiving BG values from Dexcom Share - as there are a problem with the T1D user of this site plus this site is set up to monitor Loop... ;)

whooze commented 8 months ago

I got two sites as well, well, one is mine and the other one is for a friend of mine Site 1, my friends site is running a Basic dyno and is having issues with NS crashing on a daily basis. Site 1 looks like this image

Site 2, my site, is running an Eco Dyno so I can't get thoose fanzy stats. This site got no issues with crashes but I can see in the log that there's a lot of 25 sec response times, meaning I'm pretty much at the time where things may start queing up and eventually go beyond the 30 second mark, resulting in a timeout and probably crash.

Both Sites are located in Europe, both Heroku and and Mongo.

Both sites are running 15.0.2

On crashes it seems like there is something happening at MongoDB image (Notice the stuff marked in red, the time should be the same on all three nodes, but due to patching/restarts one node goes offline for a few minutes).

Asked Mongo Support about what happend on that time and got this reply: "I can see that there was a rolling node update across the backing infrastructure that your M0 relies upon, I would recommend ensuring that your application has resiliency inbuilt to survive an election in case of primary node failover. Alternatively, dedicated clusters respect the Project wide Maintenance Window for all but critical security updates."

Don't know if this is what's causing issues, maybe response times goes up when Mongo circles nodes pushing Site 1 beyond the 30 second limit.

The issues for site 1 is quite severe since Nightscout doesn't auto-restart, meaning she has to restart the site manually

oddst commented 8 months ago

A little update for the two Nightscout sites (v15.0.2) using Heroku/MongoDB that I administer. Both is using Heroku Basic subscription and MongoDB Atlas M0 free subscription.

I have never experienced any problems due to using the MongoDB Atlas M0 subscription, as long as I avoid problems in with too much data overall or too much data in the profile collection.

The response time graph for the NS site in Ireland, do currently “not” show the problems that I have with the NS site in the US. It did have the same problem for awhile after upgrading to 15.0.2, but not anymore. This site currently only handles BG and not other looping data.

The US Nightscout site still has the same “problem”, ie a response time of 25 seconds - but that is according to Heroku’s monitoring. But a simple test of viewing the Nightscout page demonstrates that takes way shorter. So the question is if this can impact the processing of data connected to uploading from iAPS or Dexcom Share (currently using CONNECT).

This is the current graphs for the US Nightscout Suite:

IMG_1145

IMG_1146

IMG_1147

IMG_1148

IMG_1149

This is the situation for the Norwegian Nightscout Suite running in Ireland:

IMG_1140

IMG_1141

IMG_1142

IMG_1143

IMG_1144