shizunge / endlessh-go

A golang implementation of endlessh exporting Prometheus metrics, visualized by a Grafana dashboard.
GNU General Public License v3.0
977 stars 31 forks source link

Trapped time is suspiciously large (inaccurate) #87

Closed andrewlow closed 9 months ago

andrewlow commented 9 months ago

Nice project. I converted to using this one from the linuxserver.io one (https://docs.linuxserver.io/images/docker-endlessh/) because you have prometheus metrics built in.. thanks for your work on this

I think that there is something a bit odd about the actual counting of connections / time spent.

From the prometheus output:

endlessh_client_trapped_time_seconds{ip="218.92.0.125",local_port="2222"} 5242.06399999999

but my container has only been running a short while

$ docker ps | grep end
3e32fbf10ef8   shizunge/endlessh-go                       "/endlessh -enable_p…"   30 minutes ago   Up 30 minutes

30mins = 1800 seconds I don't see how I can have 5242 seconds of 'trapped time'

Here is the full prometheus output

# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 29
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="China",geohash="wtssb2djg",ip="218.92.0.125",local_port="2222",location="China, Jiangsu, Nanjing"} 33
endlessh_client_open_count{country="China",geohash="wx4g0kzf1",ip="54.222.238.164",local_port="2222",location="China, Beijing, Beijing"} 1
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 34
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="218.92.0.125",local_port="2222"} 5242.06399999999
endlessh_client_trapped_time_seconds{ip="54.222.238.164",local_port="2222"} 938.4460000000003
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 101505
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 6180.5099999999875

(so.. maybe there is a concurrent connection thing happening so that's what's up?).. but the Grafana dashboard (where I noticed this first) -- just seems way off?

image

Important to note - that the data was captured a few minutes apart - so there may be differences in the Prometheus data dump vs. the Grafana image capture -- but only a minute or two difference.

shizunge commented 9 months ago

That should be the total trapped time of the 33 simultaneous connections from the same IP.

So it is not awfully off.

Cheers

shizunge commented 9 months ago

You can confirm that by starting two connections from your home PC. If I am correct, the trapped time should double the wall time.

andrewlow commented 9 months ago

I understand that multiple connections will cause 'double' counting of time - it just (again gut feel mostly) felt unbelievably high. I will experiment to provide additional details.

andrewlow commented 9 months ago

Hmm.. ok - so I cooked up a script that does a test

#!/bin/bash

# hacky way to clear docker log file

echo "" > $(docker inspect --format='{{.LogPath}}' endlessh)

echo 'start endlesssh in docker'
#docker run -d --name=endlessh -p 2222:2222 -p 2112:2112 shizunge/endlessh-go --enable_prometheus -logtostderr -v=1
docker start endlessh

echo 'start 4 ssh sessions'

ssh -p 2222 localhost &
ssh -p 2222 localhost &
ssh -p 2222 localhost &
ssh -p 2222 localhost &

echo '----- start killing ----'

echo '--start--' > final.txt

for job in `jobs -p`
do
    curl  localhost:2112/metrics >> final.txt
    echo ' --- spacer --- ' >> final.txt
    echo $job
    kill $job
    sleep 30
done

echo 'start 3 more ssh sessions'

ssh -p 2222 localhost &
ssh -p 2222 localhost &
ssh -p 2222 localhost &A

for job in `jobs -p`
do
        curl  localhost:2112/metrics >> final.txt
        echo ' --- spacer --- ' >> final.txt
        echo $job
        kill $job
        sleep 30
done

echo 'all sessions dead'

curl localhost:2112/metrics >> final.txt

docker stop endlessh

echo '--- docker logs ---'

docker logs endlessh >> final.txt 2>&1

This results in the following output

--start--
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 1
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 4
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 4
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 91.01599999999999
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 1587
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 91.01599999999999
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 2
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 4
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 4
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 155.02
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 2631
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 155.02
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 3
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 4
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 4
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 188.019
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 3251
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 188.019
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 4
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 7
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 7
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 191.019
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 3303
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 191.019
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 5
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 7
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 7
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 254.019
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 4371
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 254.019
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 6
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 7
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 7
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 287.019
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 4930
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 287.019
 --- spacer --- 
# HELP endlessh_client_closed_count_total Total number of clients that stopped connecting to this host.
# TYPE endlessh_client_closed_count_total counter
endlessh_client_closed_count_total{local_port="2222"} 7
# HELP endlessh_client_open_count Number of connections of clients.
# TYPE endlessh_client_open_count counter
endlessh_client_open_count{country="Geohash off",geohash="s000",ip="172.17.0.1",local_port="2222",location="Geohash off"} 7
# HELP endlessh_client_open_count_total Total number of clients that tried to connect to this host.
# TYPE endlessh_client_open_count_total counter
endlessh_client_open_count_total{local_port="2222"} 7
# HELP endlessh_client_trapped_time_seconds Seconds a client spends on endlessh.
# TYPE endlessh_client_trapped_time_seconds counter
endlessh_client_trapped_time_seconds{ip="172.17.0.1",local_port="2222"} 290.019
# HELP endlessh_sent_bytes_total Total bytes sent to clients that tried to connect to this host.
# TYPE endlessh_sent_bytes_total counter
endlessh_sent_bytes_total{local_port="2222"} 5002
# HELP endlessh_trapped_time_seconds_total Total seconds clients spent on endlessh.
# TYPE endlessh_trapped_time_seconds_total counter
endlessh_trapped_time_seconds_total{local_port="2222"} 290.019
I0122 16:15:54.328556       1 metrics.go:90] Starting Prometheus on 0.0.0.0:2112, entry point is /metrics
I0122 16:15:54.329487       1 main.go:78] Listening on 0.0.0.0:2222
I0122 16:15:54.365834       1 client.go:58] ACCEPT host=172.17.0.1 port=37034 n=1/4096
I0122 16:15:54.365934       1 client.go:58] ACCEPT host=172.17.0.1 port=37032 n=2/4096
I0122 16:15:54.369398       1 client.go:58] ACCEPT host=172.17.0.1 port=37050 n=3/4096
I0122 16:15:54.372445       1 client.go:58] ACCEPT host=172.17.0.1 port=37064 n=4/4096
I0122 16:15:58.368606       1 client.go:99] CLOSE host=172.17.0.1 port=37032 time=4.002658529 bytes=45
I0122 16:16:28.383121       1 client.go:99] CLOSE host=172.17.0.1 port=37050 time=34.013676289 bytes=637
I0122 16:16:57.405482       1 client.go:99] CLOSE host=172.17.0.1 port=37064 time=63.032960703 bytes=933
I0122 16:17:28.391431       1 client.go:99] CLOSE host=172.17.0.1 port=37034 time=94.025521497 bytes=1688
I0122 16:17:54.440579       1 client.go:58] ACCEPT host=172.17.0.1 port=54456 n=1/4096
I0122 16:17:54.447693       1 client.go:58] ACCEPT host=172.17.0.1 port=54470 n=2/4096
I0122 16:17:54.452074       1 client.go:58] ACCEPT host=172.17.0.1 port=54476 n=3/4096
I0122 16:17:58.442081       1 client.go:99] CLOSE host=172.17.0.1 port=54456 time=4.001457806 bytes=79
I0122 16:18:28.466277       1 client.go:99] CLOSE host=172.17.0.1 port=54476 time=34.014167819 bytes=567
I0122 16:18:58.463809       1 client.go:99] CLOSE host=172.17.0.1 port=54470 time=64.016063243 bytes=1053
andrewlow commented 9 months ago

Let's walk through the output. A total of 7 ssh connections are made to the docker endlessh.

The docker logs are a nice concise view of this - here are the time each ssh session is connected for

  1. 4 seconds
  2. 34 seconds
  3. 63 seconds
  4. 94 seconds
  5. 4 seconds
  6. 34 seconds
  7. 64 seconds

This is a total of 297 seconds of 'trapped' time. This closely agrees with the last prometheus data pull

endlessh_trapped_time_seconds_total{local_port="2222"} 290.019

This is probably sufficient to prove that concurrent connections from a single IP are generating the 'right' data for Prometheus.

I guess I was just shocked as to how much traffic my endlessh container was actually getting..and the math quickly exceeded my human brain.

I'll close this - maybe others will find this useful if they have the same dis-belief.