shizunge / endlessh-go

A golang implementation of endlessh exporting Prometheus metrics, visualized by a Grafana dashboard.
GNU General Public License v3.0
977 stars 31 forks source link

feat: implement interval jitter #109

Open rarescosma opened 5 months ago

rarescosma commented 5 months ago

Adds a new argument called interval_jitter, expressed as a percentage number (range 0-100) that has the following effects:

If 0 => no effect.

If >0 => choose a random integer in the [-(interval * jitter / 100), +(interval * jitter / 100)] range and use it as an offset for the sleep interval.

So for example, if our interval is 1000ms and we pick interval_jitter=20, then the sleep values will be randomized in the range [800ms, 1200ms]

This has been a requested feature in the original implementation, and is useful to twarth SSH scanners that have tarpit-detecting logic.

Link: https://github.com/skeeto/endlessh/issues/71

shizunge commented 5 months ago

Is there a proof this is the correct way to resolve the problem?

shizunge commented 5 months ago

And is it a problem we need to resolve?

rarescosma commented 5 months ago

Hey, thanks for checking in - I guess we could collect a few weeks of data with the feature enabled/disabled and compare, but it would hardly qualify as a controlled experiment.

If you read the original issue they're claiming clients disconnect after exactly two intervals without the random delay.

Are we collecting data about number of intervals as well, or is everything quantified in time?

As for the "need to solve" - I can't comment. The velocity of this project is low enough to be OK to maintain a fork for the foreseeable future, so I'll leave the decision to you :)

shizunge commented 3 weeks ago

If you read the https://github.com/skeeto/endlessh/issues/71 they're claiming clients disconnect after exactly two intervals without the random delay.

There is no saying that the clients keep connection more than two interval with a random delay.

The two interval is quite easy to explain. endlessh can only report the connection time as a multiple of the interval, and it is a round-up. If the connection dies before the first tick, endlessh (sorry I don't remember exactly) probably won't report it. If the connection dies between the first tick and the 2nd tick, it reports 2x interval. E.g. assume the interval is 5 seconds, endlessh will reports 10 seconds for all connections between 5 seconds and 10 seconds, including 6s, 7s, 8s , 9s, we will never know the exact connection time.

Actually, before hitting the 2nd tick, the client won't know that the server periodically sends garbage, as the client never see the 2nd response. The client knows that it receives data at 5s, but before 10s, the connection is already broken, the client won't not know whether it will receive data at 10s. I don't think adding a jitter improve this.

And I believe attacker won't spend time to develop and deploy an advanced tarpit-detecting logic. If you were an attacker, would you spend time to study whether a server periodically sends garbage data and, and take the corresponding actions? Even so, it should after the 2nd tick, or you won't know the interval. We should see the connection keeps at least 3x interval. I think they won't waste resource on this. If I were the attacker, I will kill the connection with a simple timeout.

shizunge commented 3 weeks ago

If you can find the exact connection time, it definitely helps for a better report.

rarescosma commented 3 weeks ago

Hey 👋🏼

I could run some tests with or without the patch applied and check if it has any real impact on the mean trapped time. Problem is I introduced a bunch of other patches on my fork to deal with the slow grafana dashboard..

It's fine to close this for now, I'll just stay in sync with upstream.

~R

shizunge commented 3 weeks ago

Logically I don't think jitter solves that endlessh sees the clients disconnect at the 2nd tick.

From https://github.com/skeeto/endlessh/issues/71

delay@10000 = disconnect@20.022 delay@12543 = disconnect@25.106

I believe that means

client saw the 1st tick @10000 = disconnect@20.022 client saw the 1st tick @12543 = disconnect@25.106

Now with jitter, what would you fill into the blank ?

client sees the 1st tick @9000 = disconnect@ client sees the 1st tick @11000 = disconnect@