Open rarescosma opened 5 months ago
Is there a proof this is the correct way to resolve the problem?
And is it a problem we need to resolve?
Hey, thanks for checking in - I guess we could collect a few weeks of data with the feature enabled/disabled and compare, but it would hardly qualify as a controlled experiment.
If you read the original issue they're claiming clients disconnect after exactly two intervals without the random delay.
Are we collecting data about number of intervals as well, or is everything quantified in time?
As for the "need to solve" - I can't comment. The velocity of this project is low enough to be OK to maintain a fork for the foreseeable future, so I'll leave the decision to you :)
If you read the https://github.com/skeeto/endlessh/issues/71 they're claiming clients disconnect after exactly two intervals without the random delay.
There is no saying that the clients keep connection more than two interval with a random delay.
The two interval is quite easy to explain. endlessh can only report the connection time as a multiple of the interval, and it is a round-up. If the connection dies before the first tick, endlessh (sorry I don't remember exactly) probably won't report it. If the connection dies between the first tick and the 2nd tick, it reports 2x interval. E.g. assume the interval is 5 seconds, endlessh will reports 10 seconds for all connections between 5 seconds and 10 seconds, including 6s, 7s, 8s , 9s, we will never know the exact connection time.
Actually, before hitting the 2nd tick, the client won't know that the server periodically sends garbage, as the client never see the 2nd response. The client knows that it receives data at 5s, but before 10s, the connection is already broken, the client won't not know whether it will receive data at 10s. I don't think adding a jitter improve this.
And I believe attacker won't spend time to develop and deploy an advanced tarpit-detecting logic. If you were an attacker, would you spend time to study whether a server periodically sends garbage data and, and take the corresponding actions? Even so, it should after the 2nd tick, or you won't know the interval. We should see the connection keeps at least 3x interval. I think they won't waste resource on this. If I were the attacker, I will kill the connection with a simple timeout.
If you can find the exact connection time, it definitely helps for a better report.
Hey 👋🏼
I could run some tests with or without the patch applied and check if it has any real impact on the mean trapped time. Problem is I introduced a bunch of other patches on my fork to deal with the slow grafana dashboard..
It's fine to close this for now, I'll just stay in sync with upstream.
~R
Logically I don't think jitter solves that endlessh sees the clients disconnect at the 2nd tick.
From https://github.com/skeeto/endlessh/issues/71
delay@10000 = disconnect@20.022 delay@12543 = disconnect@25.106
I believe that means
client saw the 1st tick @10000 = disconnect@20.022 client saw the 1st tick @12543 = disconnect@25.106
Now with jitter, what would you fill into the blank ?
client sees the 1st tick @9000 = disconnect@ client sees the 1st tick @11000 = disconnect@
Adds a new argument called
interval_jitter
, expressed as a percentage number (range 0-100) that has the following effects:If 0 => no effect.
If >0 => choose a random integer in the
[-(interval * jitter / 100), +(interval * jitter / 100)]
range and use it as an offset for the sleep interval.So for example, if our interval is
1000ms
and we pickinterval_jitter=20
, then the sleep values will be randomized in the range[800ms, 1200ms]
This has been a requested feature in the original implementation, and is useful to twarth SSH scanners that have tarpit-detecting logic.
Link: https://github.com/skeeto/endlessh/issues/71