Add support for `seccomp` for sandboxing untrusted inputs or filter

cipriancraciun commented 4 years ago

This is somewhat related to #1215, however my main use-case for seccomp is somewhat different.

Given that jq is a "classical" pipe filter (read some input, apply some code, write some output) it usually just does read / write and at the end an exit. Moreover, sometimes one runs either input or especially filters that are under the control of an "attacker" (i.e. user) and sanitization is not an option.

Therefore the Linux seccomp is the perfect solution for this: it allows only read / write and exit syscalls. To be efficient it should be employed just before compiling the filter or reading the input.

However, while playing with a patch, I've stumbled into many issues:

the usage of fread / fwrite and friends uses (for some unknown reason) fstat; therefore I was forced to re-implement all these functionality (without buffering) in terms of read / write;
on Linux, the glibc uses exit_group for exit; I had to patch that by replacing in all places exit with a custom function that calls the syscall directly;
malloc uses brk and mmap to allocate; here unfortunately I was stuck; basically I'll have to replace the allocator with one that preallocates a large chunk; (however preallocating memory is a plus, as now the attacker can't exhaust the memory; moreover on Linux just malloc doesn't actually imply assigning that memory to the process;)

Would anyone be interested in such a feature?

layderv commented 4 years ago

Hi. First: I'm probably not the best person to comment on additions to jq.

From this page,

Seccomp-BPF is a more recent extension to seccomp, which allows filtering system calls with BPF (Berkeley Packet Filter) programs. These filters can be used to allow or deny an arbitrary set of system calls, as well as filter on system call arguments (numeric values only; pointer arguments can't be dereferenced). Additionally, instead of simply terminating the process, the filter can raise a signal, which allows the signal handler to simulate the effect of a disallowed system call (or simply gather more information on the failure for debugging purposes). Seccomp-bpf is available since Linux version 3.5 and is usable on the ARM architecture since Linux version 3.10. Several backports are available for earlier kernel versions.

Although this might undermine your efforts. If that helps and others are in favor of this, I'm happy to help with the implementation.

cipriancraciun commented 4 years ago

@layderv even SecComp-BPF support would be welcomed.

However SecComp (plain) vs SecComp-BPF have (in my opinion) two different use-cases:

SecComp (plain) can be guaranteed that the application is "processing"-exclusive and it can't touch any part of the OS, file-system, other processes, etc.; thus it can be used when you can't trust either the VM code (in this case the jq VM) or the "user" code (the jq script in this case);
SecComp-BPF can be used to guarantee that the jq VM doesn't step outside well-defined syscall boundaries;

For example if we know that jq should only interact with the file-system, then we can add a BPF program to disallow any socket related syscalls (for example).

However, what I am after is something even stronger: with plain SecComp enabled I want to be sure, that no matter what, jq will only read from stdin and write to stdout (or exit), with no other interaction with the environment.

What could be a use-case for such a feature? For example allowing someone from the internet to run an arbitrary query against a JSON file.

What would I like to achieve:

once the jq VM is initialized (but before parsing anything, either input or user code) and a maximum amount of memory was reserved (not actually committed), prepare stdin and stdout,
then SecComp filter is initialized,
now the code is compiled, and run to completion;

As you see, I want to cover by the SecComp filter also the compilation of the user code.

I did experiment with the SecComp feature on a personal jq branch, and I hit a wall regarding memory management, as malloc can't be told not to brk... Therefore I think this is the only remaining hurdle.

If you want I can share my branch.

layderv commented 4 years ago

To understand better: do you suggest to enable seccomp by default or with a flag? I'm assuming the latter.

SecComp-BPF can be used with arguments too, so you could inspect what a syscall is getting (if I'm not wrong). I'm happy to look at your branch and see if I get some ideas. It would be nice if we could get some other opinions on this as well

cipriancraciun commented 4 years ago

@layderv yes, I thought a flag would enable it on demand.

Regarding the SecCom-BPF, indeed it can be used to provide a "tight" confinement, but for the use-case I'm proposing it would be equivalent to a plain SecComp (i.e. there are no exceptions).

I'll try to prepare my branch and push it today.

cipriancraciun commented 4 years ago

OK, I've created a branch in my own repository where I did the following:

re-implemented all stdio functions that touch file descriptors, so that only read and write are used; (these are plain implementations without any buffering, so we can just test out the seccomp feature; later on they can be expanded;) (I had to do this, because the glibc implementation doesn't just use read / write but other syscalls;)
replaced all printf with qualified fprintf (mainly to be specific where that output should go);
replaced all exit with jq_exit that calls _exit which is in fact the exit syscall; (I know it sounds complicated, but the plain exit actually does other stuff;)
replaced isatty with a custom variant that always returns false;
added a new --seccomp flag that enables SecComp based on BPF;

As noted in the last point I had to use SecComp BPF just to whitelist the brk syscall that is used by memory management.

I have also tried to enable SecComp as the first thing in main, but somewhere getcwd is called and should be eliminated. At the moment SecComp is activated just before compiling the user program.

This is not intended to be used in production, but as a starting point for a proper SecComp implementation! (I have not thoroughly tested, and I'm sure at some point malloc will fail, or some other syscall will be triggered...)

https://github.com/cipriancraciun/jq/compare/patches/1.6/fixups...cipriancraciun:patches/1.6/seccomp

layderv commented 4 years ago

I read all your commits and posted one comment there. I can see some TODO's you've left around. What's left - other than those lines that are still TODO - and what's next?

cipriancraciun commented 4 years ago

@layderv my branch is just a proof of concept and it wasn't tested except with a few items.

So going further I think we need:

a clear decision from the jq maintainer that he will eventually merge a seccomp-based feature; (if the submitted code is to his licking;) (else our efforts are pointless...)
a decision of what is and isn't covered by the seccomp filter; (just the execution, the code parsing, or everything from the first line of main if a seccomp flag is found;) (if we want to cover the code parsing, which I see as very important, how do we treat library files;)
a decision of how tight the seccomp filter should be; only read / write as in my use-case, or other syscalls that might be used?
an enhanced test harness for this feature; (the builtin one doesn't seem to work as it requires other syscalls;)

So basically there are many unknowns...

layderv commented 4 years ago

I agree with your thoughts. Let's wait for the main maintainers to tell us if they are happy for this change and talk about the next steps afterwards. Happy to collaborate.

allanlw commented 4 years ago

Just found this issue -- I have an old (crude and buggy) patch sitting around for adding a sandbox flag to jq. It's at https://gist.github.com/allanlw/c0d7166d2341d1aff6fae59c73443e2a

Happy to rebase, clean it up and get it upstreamed if there's interest.

jqlang / jq

Add support for `seccomp` for sandboxing untrusted inputs or filter #2096