Open cipriancraciun opened 4 years ago
Hi. First: I'm probably not the best person to comment on additions to jq.
From this page,
Seccomp-BPF is a more recent extension to seccomp, which allows filtering system calls with BPF (Berkeley Packet Filter) programs. These filters can be used to allow or deny an arbitrary set of system calls, as well as filter on system call arguments (numeric values only; pointer arguments can't be dereferenced). Additionally, instead of simply terminating the process, the filter can raise a signal, which allows the signal handler to simulate the effect of a disallowed system call (or simply gather more information on the failure for debugging purposes). Seccomp-bpf is available since Linux version 3.5 and is usable on the ARM architecture since Linux version 3.10. Several backports are available for earlier kernel versions.
Although this might undermine your efforts. If that helps and others are in favor of this, I'm happy to help with the implementation.
@layderv even SecComp-BPF support would be welcomed.
However SecComp (plain) vs SecComp-BPF have (in my opinion) two different use-cases:
jq
VM) or the "user" code (the jq
script in this case);jq
VM doesn't step outside well-defined syscall boundaries;For example if we know that jq
should only interact with the file-system, then we can add a BPF program to disallow any socket related syscalls (for example).
However, what I am after is something even stronger: with plain SecComp enabled I want to be sure, that no matter what, jq
will only read from stdin
and write to stdout
(or exit
), with no other interaction with the environment.
What could be a use-case for such a feature? For example allowing someone from the internet to run an arbitrary query against a JSON file.
What would I like to achieve:
jq
VM is initialized (but before parsing anything, either input or user code) and a maximum amount of memory was reserved (not actually committed), prepare stdin
and stdout
,As you see, I want to cover by the SecComp filter also the compilation of the user code.
I did experiment with the SecComp feature on a personal jq
branch, and I hit a wall regarding memory management, as malloc
can't be told not to brk
... Therefore I think this is the only remaining hurdle.
If you want I can share my branch.
To understand better: do you suggest to enable seccomp by default or with a flag? I'm assuming the latter.
SecComp-BPF can be used with arguments too, so you could inspect what a syscall is getting (if I'm not wrong). I'm happy to look at your branch and see if I get some ideas. It would be nice if we could get some other opinions on this as well
@layderv yes, I thought a flag would enable it on demand.
Regarding the SecCom-BPF, indeed it can be used to provide a "tight" confinement, but for the use-case I'm proposing it would be equivalent to a plain SecComp (i.e. there are no exceptions).
I'll try to prepare my branch and push it today.
OK, I've created a branch in my own repository where I did the following:
stdio
functions that touch file descriptors, so that only read
and write
are used; (these are plain implementations without any buffering, so we can just test out the seccomp
feature; later on they can be expanded;) (I had to do this, because the glibc
implementation doesn't just use read
/ write
but other syscalls;)printf
with qualified fprintf
(mainly to be specific where that output should go);exit
with jq_exit
that calls _exit
which is in fact the exit
syscall; (I know it sounds complicated, but the plain exit
actually does other stuff;)isatty
with a custom variant that always returns false;--seccomp
flag that enables SecComp based on BPF;As noted in the last point I had to use SecComp BPF just to whitelist the brk
syscall that is used by memory management.
I have also tried to enable SecComp as the first thing in main
, but somewhere getcwd
is called and should be eliminated. At the moment SecComp is activated just before compiling the user program.
This is not intended to be used in production, but as a starting point for a proper SecComp implementation! (I have not thoroughly tested, and I'm sure at some point malloc
will fail, or some other syscall will be triggered...)
https://github.com/cipriancraciun/jq/compare/patches/1.6/fixups...cipriancraciun:patches/1.6/seccomp
I read all your commits and posted one comment there. I can see some TODO's you've left around. What's left - other than those lines that are still TODO - and what's next?
@layderv my branch is just a proof of concept and it wasn't tested except with a few items.
So going further I think we need:
jq
maintainer that he will eventually merge a seccomp
-based feature; (if the submitted code is to his licking;) (else our efforts are pointless...)seccomp
filter; (just the execution, the code parsing, or everything from the first line of main
if a seccomp
flag is found;) (if we want to cover the code parsing, which I see as very important, how do we treat library files;)seccomp
filter should be; only read
/ write
as in my use-case, or other syscalls that might be used?So basically there are many unknowns...
I agree with your thoughts. Let's wait for the main maintainers to tell us if they are happy for this change and talk about the next steps afterwards. Happy to collaborate.
Just found this issue -- I have an old (crude and buggy) patch sitting around for adding a sandbox flag to jq
. It's at https://gist.github.com/allanlw/c0d7166d2341d1aff6fae59c73443e2a
Happy to rebase, clean it up and get it upstreamed if there's interest.
This is somewhat related to #1215, however my main use-case for
seccomp
is somewhat different.Given that
jq
is a "classical" pipe filter (read some input, apply some code, write some output) it usually just doesread
/write
and at the end anexit
. Moreover, sometimes one runs either input or especially filters that are under the control of an "attacker" (i.e. user) and sanitization is not an option.Therefore the Linux
seccomp
is the perfect solution for this: it allows onlyread
/write
andexit
syscalls. To be efficient it should be employed just before compiling the filter or reading the input.However, while playing with a patch, I've stumbled into many issues:
fread
/fwrite
and friends uses (for some unknown reason)fstat
; therefore I was forced to re-implement all these functionality (without buffering) in terms ofread
/write
;glibc
usesexit_group
forexit
; I had to patch that by replacing in all placesexit
with a custom function that calls the syscall directly;malloc
usesbrk
andmmap
to allocate; here unfortunately I was stuck; basically I'll have to replace the allocator with one that preallocates a large chunk; (however preallocating memory is a plus, as now the attacker can't exhaust the memory; moreover on Linux justmalloc
doesn't actually imply assigning that memory to the process;)Would anyone be interested in such a feature?