vmonaco / kloak

Keystroke-level online anonymization kernel: obfuscates typing behavior at the device level.
BSD 3-Clause "New" or "Revised" License
491 stars 36 forks source link

keyboard randomly sometimes stops working #31

Closed adrelanos closed 1 week ago

adrelanos commented 4 years ago

I cannot provide instructions on how to reproduce this. However the keyboard randomly sometimes stops working. Experienced this myself and also got user reports of this happening.

Let me know if there is anything I can do to help with debugging.

adrelanos commented 1 year ago

Sometimes when typing a letter in a terminal such as n, kloak starts repeating the name letter over and over again such as for example nnnnnnn at infinite. The keyboard is then effectively broken until reboot.

Quite likely the same bug.

adrelanos commented 1 week ago

https://forums.whonix.org/t/sdwdate-can-cause-system-time-to-jump-backwards-causing-issue-with-kloak/20433

At boot of Whonix-Workstation sdwdate attempts sync of system time. If the VM is interacted with by keyboard while sdwdate is setting the system time and it jumps backwards kloak will repeat the previously recorded input indefinitely.

ArrayBolt3 commented 1 week ago

I have a reliable way of reproducing this that should work on any Linux distro with bash and date (which I think is probably all of them).

The main issue lies in this bit of code in void main_loop():

current_time = current_time_ms();
while ((np = TAILQ_FIRST(&head)) && (current_time >= np->time)) {
    emit_event(np);
    TAILQ_REMOVE(&head, np, entries);
    free(np);
}

This assumes that current_time will always be a larger value than it was previously, ensuring that key events eventually get through, even if it takes a while. If the clock changes backwards, this assumption no longer holds true, and if the clock goes back while the user is actively typing, it's possible (and even easy) to end up with some events in the queue with a value significantly higher than current_time. This naturally results in the events no longer being sent until current_time does manage to exceed the time value in one of the events, which can take a while. Why exactly this results in a key repeating until the queue gets unstuck, I don't know, though I assume it has something to do with how libev works, but it doesn't really matter.

There are a few possible solutions here:

The first solution is probably the best as it simply sidesteps the system clock entirely. We don't care about what time it actually is, we just care about relative times for the sake of inserting delays. The following patch to main.c works for me:

@@ -77,7 +77,7 @@ void sleep_ms(long milliseconds) {

 long current_time_ms(void) {
     struct timespec spec;
-    clock_gettime(CLOCK_REALTIME, &spec);
+    clock_gettime(CLOCK_MONOTONIC, &spec);
     return (spec.tv_sec) * 1000 + (spec.tv_nsec) / 1000000;
 }

One potential worry with this though is the following note from the clock_gettime manpage:

All variants guarantee that the time returned by consecutive calls will not go backwards, but successive calls may—depending on the architecture—return identical (not-increased) time values.

So this may cause issues with anonymity on some hardware - hardware with a not-too-fine-grained timer could interfere with the precise delays kloak inserts.

The other options are to just work around system time changes. The issue with those two solutions is that they will cause a recognizable "bump" in the obfuscated typing pattern, allowing someone observing a user's keystroke rhythm to determine when their clock jumped backwards. The former solution causes many events to be sent at once, while the latter one will cause a bit of a delay that may be perceivable against the noise of randomized delays (this would probably be more noticeable with a short delay than with a long one). Ultimately of the two, the latter is probably better for security, but neither are perfect, and I'm not sure there is a perfect solution here if we're depending on the system clock.

I've implemented the first of these two solutions (flushing events when the clock goes backwards). Like described above, it may not be all that great from a security standpoint, but it was my first attempt to fix the issue and it worked. The patch to main.c looks like this:

@@ -236,6 +236,7 @@ void main_loop() {
     int err;
     long prev_release_time = 0;
     long current_time = 0;
+    long prev_current_time = 0;
     long lower_bound = 0;
     long random_delay = 0;
     struct input_event ev;
@@ -265,14 +266,23 @@ void main_loop() {
     // so that events are always scheduled in the order they
     // arrive (FIFO).
     while (!interrupt) {
-        // Emit any events exceeding the current time
         current_time = current_time_ms();
-        while ((np = TAILQ_FIRST(&head)) && (current_time >= np->time)) {
-            emit_event(np);
-            TAILQ_REMOVE(&head, np, entries);
-            free(np);
+        if (prev_current_time == 0)
+            prev_current_time = current_time;
+
+        // Emit any events exceeding the current time, or emit everything if
+        // the clock has gone backwards
+        while (np = TAILQ_FIRST(&head)) {
+            if ((current_time < prev_current_time) || (current_time >= np->time)) {
+                emit_event(np);
+                TAILQ_REMOVE(&head, np, entries);
+                free(np);
+            } else
+                break;
         }

+        prev_current_time = current_time;
+
         // Wait for next input event
         if ((err = poll(pfds, device_count, POLL_TIMEOUT_MS)) < 0)
             panic("poll() failed: %s\n", strerror(errno));

@adrelanos which solution sounds best to you?

adrelanos commented 1 week ago

monotonic time seems best indeed.

but successive calls may—depending on the architecture—return identical (not-increased) time values.

If that happens, add some extra delay?

In other words, once clock jumps are detected, add extra delay?

ArrayBolt3 commented 1 week ago

With monotonic time it won't be possible to detect a clock jump since the clock won't jump. The issue I was thinking of was the possibility that the monotonic timer might not be super fine-grained on some architectures and as a result of the lower resolution the randomized delay effect wouldn't work as effectively. This does not seem to be the case however at least on my x86_64 laptop, and even if it is the case on some systems it seems unlikely to me that realtime would be more precise than monotonic time.

adrelanos commented 1 week ago

Great!

So this should be resolved thanks to https://github.com/Whonix/kloak/pull/1