vmonaco / keystroke-obfuscation

Obfuscate keystroke timings to help protect privacy
BSD 3-Clause "New" or "Revised" License
18 stars 2 forks source link

Protection tool for Whonix, Tails #1

Open HulaHoopWhonix opened 7 years ago

HulaHoopWhonix commented 7 years ago

Hi. I've been reading up on the privacy implications keystroke dynamics and came across your (excellent) past research and now this.

I am affiliated with the Whonix project a Tor centric privacy OS similar to Tails but uses a VM anonymizing middlebox architecture.

We've been interested in a countermeasure for this deanonymization vector for the longest time. Unfortunately none of us knows C or enough about the guts of the kernel to write such a tool. Only recently I learned that the uinput API (maybe even python-uinput) can provide a way to influence keystroke timings but there is no program readily available to set this up AFAIK.

Can you please consider writing something we can include?

vmonaco commented 7 years ago

Thanks for your interest. I agree that some type of kernel-level keystroke timing obfuscation would make a nice addition to Whonix and other privacy-preserving tools.

I think the right way to go about this is to write a custom keyboard device driver. This would create a special device file for obfuscated keystroke input, available system-wide.

Another option is to write a browser plugin, but then this functionality won't be available in other programs (SSH in interactive mode is especially vulnerable).

Unfortunately, I too am not well versed on the kernel. I could probably hack something together, but there are no guarantees that this wouldn't open up other vulnerabilities. If there is someone who knows enough about writing device drivers, I would be happy to assist in writing the obfuscation mechanism in C.

HulaHoopWhonix commented 7 years ago

I've had an idea about an alternative to writing kernel code that directly does this. It was inspired by this answer on SE: https://stackoverflow.com/a/33134735

Basically funnel all system input events through a local network interface which you inject random latency in. On host so its system wide.

The network latency tools: iperf stress tool or using the kernel's netfilter_queue to delay packets randomly.

Fortunately there is a tool out there that can redirect all the host input to some destination. Netevent: https://github.com/Blub/netevent/wiki/Share-devices-over-the-net

Netevent cobbles netcat host/client together. We can run it as a service and set it to send on the loopback interface so the client and server communication never leaves the machine. Pros: kernel solution, display server agnostic. (It uses uinput interface to capture all events).


This sounds very convoluted but it would be great if it does indeed work.

vmonaco commented 7 years ago

Interesting. How well is the loopback interface protected? Would it be easier for malware to listen to network traffic than register a system-wide hook? I'm guessing this depends partly on the permissions of the device files. Although, even if just the timings are observed, the keys pressed can be reconstructed with a fair amount of confidence.

This seems like it could work using netfilter_queue. I could write the buffering mechanism as a library that's fairly self contained, which could be used in this or other solutions.

HulaHoopWhonix commented 7 years ago

Interesting. How well is the loopback interface protected? Would it be easier for malware to listen to network traffic than register a system-wide hook?

Looks like sniffing a network interface (including loopback) needs root and net capabilities: https://security.stackexchange.com/a/58031

I also recall that using tools like tshark for network leak tests required root so it wouldn't be any easier than the privileges needed for system hooks.

This seems like it could work using netfilter_queue. I could write the buffering mechanism as a library that's fairly self contained, which could be used in this or other solutions.

That would be great. Thanks for offering to help. Also please feel free to drop by our bugtracker: https://phabricator.whonix.org/T542

There is some netfilter_queue code a researcher has written to foil network latency covert channel: https://gist.github.com/ethan2-0/2c8505049c991fe0aac3d303dddb6075

Maybe there is some parts of it that can be re-purposed so you don't have to start from scratch?

vmonaco commented 7 years ago

Thanks for the additional info. I've been actively planning how to go about this, and I'm going to take this approach:

I'm in the process of extending the existing work. There's a key connection to queuing processes that hasn't yet been made. The system is essentially an MMPP/MMPP/1 queue: https://en.wikipedia.org/wiki/Kendall%27s_notation

I'm also looking at how this technique, or something similar, could be applied to mouse pointer motion. Mouse biometrics are less studied, but the consensus is that they can also be effective for identification/verification.

HulaHoopWhonix commented 7 years ago

Excellent. We really appreciate your work. Please feel free to ping this thread when its ready.

I'm also looking at how this technique, or something similar, could be applied to mouse pointer motion. Mouse biometrics are less studied, but the consensus is that they can also be effective for identification/verification.

Indeed. There are some successful attack prototypes for this too [1]. Does mitigating mouse motion fingerprinting need more than just delaying input events?

[1] http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprinting.html [2] http://www.cs.wm.edu/~hnw/paper/ccs11.pdf

vmonaco commented 7 years ago

So, here's a first attempt at something that could be used for keystroke privacy protection: https://github.com/vmonaco/kloak

The above application grabs the input device, randomly delays the key events, and writes the events to a user-level input device via uinput. I thought this approach was less intrusive and more portable than a kernel module or device driver. It could be run in a startup script run by root, or turned on/off as needed.

And to answer your question above (sorry for the delay...), I'm not really sure. I think the delays might help, but most mouse motion biometrics are based primarily on the shape of the trajectory. Fortunately, the OS introduces an artificial acceleration, and this varies greatly by OS, so what ends up being measured is the pointer motion and not the physical mouse motion.

Lastly, I forgot to mention that similar to keystroke biometrics, the packet inter-arrival times of a wireless device can be used to identify the device type (and sometimes even a particular device) in a passive analysis, e.g., http://www2.ece.gatech.edu/cap/papers/1569740227-3.pdf. I don't know if anything like this has been deployed, but a similar obfuscation strategy should make techniques like that less effective.

HulaHoopWhonix commented 7 years ago

Thank you so much and Happy Holidays. Seems Christmas came a little earlier this year :D We will test and deploy this ASAP.

And to answer your question above (sorry for the delay...), I'm not really sure. I think the delays might help, but most mouse motion biometrics are based primarily on the shape of the trajectory. Fortunately, the OS introduces an artificial acceleration, and this varies greatly by OS, so what ends up being measured is the pointer motion and not the physical mouse motion.

Your solutions are much more effective than the network latency suggestion - that was a really a (desperate) hack in absence of a better way.

You have been very generous with us and I don't want to ask too much - if you feel like it and find the time to write something similar to obfuscate pointer motion, we would appreciate it a lot. This would shut the door on all major ways they can track behavior, combined with the anti-stylommetry tool Anonymouth (whenever they finish migrating to OpenJDK) and users have a powerful toolbox.

Lastly, I forgot to mention that similar to keystroke biometrics, the packet inter-arrival times of a wireless device can be used to identify the device type (and sometimes even a particular device) in a passive analysis

Very interesting! and scary. I always had a hunch something like this is possible. I'll look into the ramifications of this on user anonymity. I hope there is some easy way to mitigate.

Higher level cognitive behavior, such as editing and application usage, are still apparent. These lower-frequency actions are less understood at this point, but could potentially be used to reveal identity.

Is there some literature on this? I'd like to know more.

vmonaco commented 7 years ago

I'm happy to work on a solution for mouse biometrics. Implementation can be done similarly to kloak, by modifying mouse events before they're written back to the user device. The hard part is developing an obfuscation model that doesn't affect user experience too much and one that doesn't defeat it's purpose. The relative mouse motion events are usually generated up to some maximum frequency (e.g., 1 event/8 ms), which decreases when velocity decreases. Introducing a random delay may do more harm than good, allowing users with the tool running to be identified.

Re. higher level behavior: For example, see this paper. I think that most higher level "cognitive" behavioral biometrics will be pretty application specific. That paper uses descriptive statistics for actions that are very specific to the game and don't really apply anywhere else.

[Edit] See also "Identifying Users with Application-Specific Command Streams" and references therein. This work used an older dataset containing MS Word actions: http://www.research.rutgers.edu/~sofmac/ml4um/

adrelanos commented 7 years ago

Happy New Year!

Thank you for creating kloak!

The kloak compilation and usage instructions are super simple to follow. (Tested in a VirtualBox Whonix VM.) Was running:

sudo ./kloak -r /dev/input/event0 -w /dev/uinput -v

In my first test using iceweasel, my keytrac detection scores reduced. Once to 8% and once to 82%. So we might have to fine tune the delays?

In my second test using Tor Browser I bumped into a bug:

pthread_create() failed: cannot allocate memory

And the last key pressed (o) kept being sent over and over again. (Very unlikely that my VM was really out of memory.)

The emergency key combination Right Shift + Right Ctrl is non-ideal, since VirtualBox default host key is Right Ctrl. Would be great if we could change that.

vmonaco commented 7 years ago

Thank you for the feedback, and happy new year!

The key combo is an easy fix. How about defaulting to something else and letting the user specify the combo as command line params?

I was able to reproduce the repeating key bug a few times, typically with longer delays. Still investigating the cause... See updates here: https://github.com/vmonaco/kloak/issues/1

Re. choosing a delay, were those results the train kloak/test kloak scenario? Some fine tuning might be required. You can try something like ~500 ms and work your way down until the delay becomes tolerable or not noticeable. I'm also looking into a variable maximum delay that depends on typing speed. This would avoid having to choose a delay, automatically setting a sensible max delay according to typing speed (I think slower typists can tolerate a larger delay than faster typists).

HulaHoopWhonix commented 7 years ago

Thanks a lot and Happy New Year :)

I've been looking at mouse click dynamics (as opposed to movements) and this study[1] proposes a system based on just that for keyboardless devices like tablets. Its success rate is not high enough to be used on it own yet so the authors recommend it as a backup to keyboard fingerprinting. In your opinion, is there a similar practical solution for click duration like you what you thought of for movements?

[1] http://www.ijicic.org/ijicic-ksi-03.pdf - User Authentication using Rhythm Click Characteristics for Non-Keyboard Devices

[2] https://www.ibm.com/developerworks/library/os-userauth-mouse/index.html - IBM perl guide for fingerprinting mouse click-hold times

adrelanos commented 7 years ago

Since keytrac isn't Open Source, I guess there is no way to know how bad 1% is vs 2% or 0%?

The key combo is an easy fix. How about defaulting to something else and letting the user specify the combo as command line params?

Yes, that would be great!

Re. choosing a delay, were those results the train kloak/test kloak scenario?

Yes.


My typing speed is above 500 CPM. 10 finger and "untrained". Well, many years ago I learned 10 finger typing but didn't make an effort for years now to improve that since typing is probably not my productivity bottleneck. Just tried in a 1 minute typing speed test (for whatever that's worth). And I doubt I could keep doing that speed for long times, but it is probably the speed with which I am typing usernames / passwords at keytrac.


default (100) ms Train normal, test normal 94 % / 97 %

Train normal, test kloak 33 % / 99 % / 82 %


300ms Train normal, test kloak 1 %


200ms Train normal, test kloak 26 %


300ms Train kloak, test kloak 85 %

HulaHoopWhonix commented 7 years ago

@vmonaco Is it okay to stack kloak instances? - run it on the host and VM at the same time

vmonaco commented 7 years ago

@HulaHoopWhonix thanks for the links. Yes, the same exact techniques could be applied to mouse clicks. It would be effective against the "higher frequency" click actions (mainly the duration of a single click and the various double click time intervals). I agree that mouse clicks alone aren't particularly effective, except possibly in applications with a high volume of clicks.

We did a study a few years ago, had ~20 users play Solitaire and Star Bubbles (both online games, the latter requires many clicks), and could identify users by mouse click behavior with 37% accuracy. That's using a pretty simple classifier, training on the first session, and using the remaining sessions for testing (See https://gist.github.com/vmonaco/209647bc6438b1d045d738156179367f)

@adrelanos Correct - with only a few scores, it's difficult to say how they relate to each other. With the scores from many users and sessions per user, it would be possible to determine the accuracy of their system. Since keytrac gives a numeric value (instead of just a accept/reject decision), these can be used to derive an ROC curve and estimate system performance by obtaining many genuine and impostor scores. This would require a bunch of volunteers to obtain the scores and simulate the impostor scores by swapping credentials.

vmonaco commented 7 years ago

@HulaHoopWhonix and yes, stacking the kloaks should be find. Though, the maximum delay on the VM will be the sum of maximum delays in each instance, so you might experience more lag there.

adrelanos commented 7 years ago

I found an overview with special keys used by various virtualizers. It would be great if the default emergency key of kloak would not use any of these.

http://vmetc.com/2008/10/02/stuck-in-a-vm-%E2%80%93-to-release-the-mouse-press-the-host-key/

vmonaco commented 7 years ago

@adrelanos thanks! How about "Left Shift + Right Shift + Escape"? This should be pretty hard to press accidentally - a situation we definitely want to avoid.

vmonaco commented 7 years ago

@adrelanos Key combo fixed in the latest commit, can now be specified on the command line and has the above default.

adrelanos commented 7 years ago

That's great!

adrelanos commented 7 years ago

keycodes.c introduced an compilation error. Reported it here: https://github.com/vmonaco/kloak/issues/2

adrelanos commented 7 years ago

As per debian-mentors mailing list - Mixed kloak anti keystroke / mice deanonymization tool package or better two separate packages?...

If you were to provide a mice anti fingerprinting tool also, please add the sources to your existing kloak source code repository. Of course this is just a friendly suggestion. After all, distributions have to wrap their head around packaging and not upstream around distribution policies.

vmonaco commented 7 years ago

Thanks, that's what I'll do. The timing delays could be applied to clicks and other discrete events, so it makes sense to share some of this code. The pointer "shape" is more difficult. I created https://github.com/vmonaco/kloak/issues/7 to track progress.