louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
58.7k stars 5.29k forks source link

Http(s) monitoring delay fluctuates seriously (from ~50ms to ~1000ms) #4583

Closed Nebulosa-Cat closed 7 months ago

Nebulosa-Cat commented 7 months ago

⚠️ Please verify that this question has NOT been raised before.

πŸ›‘οΈ Security Policy

πŸ“ Describe your problem

My uptime kuma is hosting on the fly.io Japan Tokyo Regin (NRT). And one of the website it need to monitor is host on japan vps other host on my home pi at taiwan (the ping between taiwan and japan about ~100ms)

when I ssh into the fly.io instance, the ping or curl test jp vps's website domain and ip are all about 0.1ms ~ 5ms and the delay on monitor is about ~70ms, this is normally

but i notice it have some problem make it like this the website host on jp vps (https): image image the monitor of the jp vps (ping): image

one of the website host on taiwan (https): image

so it looks not because the latency between two mechine (can see the ping between kuma and vps under 3ms) is there anything i can check for solve the https monitor ping issue ?

πŸ“ Error Message(s) or Log

No response

🐻 Uptime-Kuma Version

1.23.11

πŸ’» Operating System and Arch

unknow, Fly-io's intance, docker version kuma

🌐 Browser

Vivaldi 6.6.3271.50

πŸ–₯️ Deployment Environment

my fly.io config

app = 'name-of-this-uptime-kuma'
primary_region = 'nrt'
swap_size_mb = 256

[build]
  image = 'louislam/uptime-kuma:1'

[[mounts]]
  source = 'kuma_data'
  destination = '/app/data'
  initial_size = '1'

[http_service]
  internal_port = 3001
  force_https = true
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 1
  processes = ['app']

[[vm]]
  cpu_kind = 'shared'
  cpus = 1
  memory_mb = 1024
CommanderStorm commented 7 months ago

Foremost, you have presented two measurements and 4 graphs. To prevent confusion, please edit this to be a table with your expectation, where to where you are measuring from and the screenshot. Currently, interpreting something into these graphs from my side would likely result in misinterpretations.

when I ssh into the fly.io instance, the ping or curl test jp vps's website domain and ip are all about 0.1ms ~ 5ms and the delay on monitor is about ~70ms, this is normally

Over what timeframe have you run these measurements? Latency is expected to vary quite a bit. (networking goblins eating packets are real)

Nebulosa-Cat commented 7 months ago

Foremost, you have presented two measurements and 4 graphs. To prevent confusion, please edit this to be a table with your expectation, where to where you are measuring from and the screenshot. Currently, interpreting something into these graphs from my side would likely result in misinterpretations.

when I ssh into the fly.io instance, the ping or curl test jp vps's website domain and ip are all about 0.1ms ~ 5ms and the delay on monitor is about ~70ms, this is normally

Over what timeframe have you run these measurements? Latency is expected to vary quite a bit. (networking goblins eating packets are real)

Sorry, but can you provide more detailed formatting requirements? Because I'm not sure what data you would need to reference.

In addition, the ping packet test was conducted in a short period of time and only tested basic connectivity and latency. The long-term measurement is shown in the chart above (services running on the same machine monitor the huge difference in latency through ping and https), If you need other types of tests, please tell me what I should do and I can try to execute them.

The main expectation is to solve the problem of abnormal fluctuation of delay data under https monitoring. The delay should be within 100ms under normal circumstances, but when the monitored end does not change the settings, the delay of kuma is abnormally severe in the range of 50~1000ms. float

CommanderStorm commented 7 months ago

I created a measurement to gauge 1h of baseline latency between Taiwan and https://ec2.ap-northeast-3.amazonaws.com/ping β‡’ we are going to know in 1h what the approximate results are https://atlas.ripe.net/measurements/68534999

(I have a hypothesis that there might be some deep packet inspection and some packets being routed via china going on...)

(personal note: running a ripe atlas probe really helps the networking research community a ton, especially from "not europe", and allows you to also do such measurements)

Nebulosa-Cat commented 7 months ago

I created a measurement to gauge 1h of baseline latency between Taiwan and https://ec2.ap-northeast-3.amazonaws.com/ping β‡’ we are going to know in 1h what the approximate results are https://atlas.ripe.net/measurements/68534999

(I have a hypothesis that there might be some deep packet inspection and some packets being routed via china going on...)

(personal note: running a ripe atlas probe really helps the networking research community a ton, especially from "not europe", and allows you to also do such measurements)

Uptime-Kuma host: Fly.io Tokyo

First serivce mechine region: Tokyo size: 2c12g ping test result between uptime-kuma host instance: 0.1~2ms what's run on it: hexo blog with caddy the moniter result of http(s) domain: [last] highest - 1265ms lowest - 43ms 24hr avg - 600ms image

[1 week] (the lower point is 58 ms) image

the moniter result of ping domain: highest - 1.22ms lowest - 0.838ms 24hr avg - no result (not 24 hour yet) image

the moniter result of ping mechine ip: [last] highest - 2.27ms lowest - 1.72ms 24hr avg - 1ms image

[1 week] (highest 6.8ms) image

Second service mechine region: Taiwan size: raspberry pi 4b 4g, 32g sd card what's run on it:

  1. a wake-on-lan service's web panel reverse proxxy by caddy (the service is UpSnap)
  2. a proxy source manager name sub-store, node.js, panel also reverse proxy by caddy (service: sub-store) both of this run on docker, caddy is install in system [WoL Service] highest - 4033ms lowest - 273ms 24hr avg - 762ms image

[Sub-Store] highest - 2010ms lowest - 282ms 24hr avg - 767ms image

Since my language skills are not sufficient to fully express the following in English, the following part is performed by Google Translate:

Hope this makes it more organized and easier to read

The strange thing is that there is a huge difference in latency between HTTPS monitoring and ping monitoring on the same machine, and only https monitoring has a lot of huge changes.

Even if the machine where uptime kuma operates and the machine being monitored are both in Japan or even in the same city (Tokyo), the ping delay is basically below 2ms, https still produces a huge delay difference (43ms and 1265ms), and the ping URL itself and ping The machine IP delay is stable (basically within 3ms)

For machines in Taiwan, the latency of https also changes drastically and is much higher than the results of ping (the average 24hr of ping domain is 88ms, and the average 24hr of https domain is 763ms)

So is it possible that the delay caused by some aspect of https monitoring itself is included in the delay statistics?

In addition, I will study Atlas Probe, which looks like a research-based global network quality monitoring project. I will study how to apply to join.

CommanderStorm commented 7 months ago

https produces a huge delay difference

That is to be expected. Https does have to establish a full TLS connection, with all pros (reliability) and cons (Collision detection and avoidance) this brings with it. While http/3 does support the switch to UDP, this variability introduced is still there.

I don't see this as a fault in our system, rather general expected behaviour. Is there something that I am missing?

how to apply to join

No need to apply to join. Just register with your email address and you are in. If you send me your address, I can send you a 100k credits to get you started ^^

One con is that http (as opposed to ping and ping6) checks can only be run against anchors (servers set up by ripe's members) for safety/ethical reasons.

Nebulosa-Cat commented 7 months ago

I think I successfully enabled the software probe in my home laboratory, but since my home network environment (I am a fourth-year college student) does not have a fixed IP, the telecom operator will wait for about 2 to 4 days at 0:00 in the morning. ~An IPv4 address is reassigned at random time at 5 o'clock. I am not sure whether this probe project can correctly identify the IP change, and it seems that there is no need to open the port? And I need to perform national military service after graduation. During this period, I should move the machine back to my hometown until the military service ends and start working again and deploy it in a new city. I hope the probe can detect relevant changes.

my probe id is Probe #1007763 my mail: work@nebulosa-cat.com (It's actually the same as my github email, so posting it publicly isn't a big deal.)

Nebulosa-Cat commented 7 months ago

i mean, the latency between ping (2ms) and https(60ms) i think is reasonable but the https sometimes 60ms sometimes 1000ms seems not correct ? and not only japan to japan like this, japan to taiwan same too, this is main part i think weird

CommanderStorm commented 7 months ago

Don't think this is an issue coming from uptime kuma => closing as resolved If you have a case which I can reproduce which cannot be explained by deep packet inspection, being routed through the great firewall, tls latency variation we would love to obviously heart that. Otherwise I don't think there is something that we can do..