metaparticle-io / package

Metaparticle/Package: Language Fluent Containerization and Deployment in Java, .NET and Javascript (and more coming soon)
https://metaparticle.io
MIT License
493 stars 55 forks source link

[proposal] go: generate at compile time seccomp filters for go applications #80

Open jessfraz opened 6 years ago

jessfraz commented 6 years ago

(preface: totally understand if this is out of scope but could be a cool feature)

Invisible Sandboxing of Applications

One of the fun and innovative things that could be done since metaparticle is in charge of handling the users code and running the specific function passed, is an automatic generation of a seccomp profile for their application/function being run.

Background

Seccomp is "secure computing with filters." It allows developers to write BPF programs that determine whether a given system call will be allowed or not.

It has support in container runtimes as well as k8s.

Integration with metaparticle

Since metaparticle knows the go code that it needs to run, it could generate a list of syscalls required for that, and then automatically apply it to the kubernetes config.

In laymans terms, metaparticle would automatically perfectly sandbox your application so even if a malicious individual cracked the application running, they would only be allowed to execute the syscalls required by the application in the container. This would reduce the attack surface substantially.

Go makes parsing the syscalls easy because of the design. I had personally made a POC of this with the go compiler in the past :)

Just an idea. You could do it with the other languages as well, but I don't know enough about their runtime internals to know how difficult or complex it would be.

brendandburns commented 6 years ago

I like this idea! Do you have a pointer for how we can interogate a golang program to find it's syscalls?

jessfraz commented 6 years ago

So before when I did it as a POC I was hijacking at compile time (in the go compiler) the syscalls that were being assembled via: https://github.com/golang/go/blob/master/src/syscall/asm_linux_amd64.s

I think there are two ways to go (pun) about this here though since we can't really hijack at compile time, unless we run a fork of go in the container that builds, but ew gross.

The first two options could be built as a separate binary, actually, which would be nice since then whatever gross assembly parsing is required wouldn't have to live in the package itself.

The binary could run on the code between when it is compiled in the container and when it is deployed to an executor.

Option 1: It would do all the bits I had from the go compiler that looked at the syscalls as they were being assembled. This might be gross though since I was leaning heavily on the go compiler for this so it might just wind up being a smaller version of the go compiler... idk but kinda icky.

Option 2: Which @erikstmartin mentioned in slack, which could work for other languages too would be: static analysis of the assembly of a binary, based on the instructions, reverse the instruction into the syscall. go's are laid out really nicely so I think it's doable.

Option 3: trying to get a feature into go to have the runtime package be introspectable to what syscalls are used. Then it could be done in the compiler itself and called from metaparticle or any library seamlessly.

Going to try out some things and see :)

nwmcsween commented 6 years ago

I was working on something like this but was hoping to do it for any binary (shared or static) sort of a inline 'containerization'. For seccomp In the shared case we would have to disas the binary and scan for syscalls as well as chasing plt, dlopen, dlsym etc entries and doing the same for each function, the static case is still a pain as we have to still chase dlopen, dlsym etc calls as glibc uses dlopen on static binaries and others might as well. Radare2 recently gained support of listing syscalls from regions so it might be possible to script something up to get a listing of needed syscalls.

The tool would have to run on every compilation as the underlying implementations can change giving different syscalls which would also need something like google kafel or something similar and not use actual syscall NR's as they change on different arches.

It might be possible to objcopy --add-section .seccomp=kafel-dumped-bpf-filter and have a constructor as such:

extern const unsigned char seccomp_start;
extern const unsigned char seccomp_end;
extern const unsigned seccomp_size;

__attribute__((constructor))
void do_seccomp(void)
{
        struct sock_fprog *sbpf = (struct sock_fprog *)&seccomp_start;

        if (seccomp_size) {
                if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
                    || prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, sbpf)) {
                        __builtin_trap();
                }
        }
}

The idea behind using a section instead of inline is that it can be updated, removed, etc and you don't actually know the syscalls until after compilation.

For unshare we could do the something similar but I haven't really explored it too much.

Now the issues:

jessfraz commented 6 years ago

I think you are over thinking this a bit.

I have done what you are trying to do before but this specific use case is unique in that you do not need the filter to be run with the binary, you lean on the implementation in the orchestrator to do that.

Plus metaparticle is more for running applications and the possibility of someone using a cgo and doing a dlopen is really quite low. That’s why this specific use case is so nice because it can be cleanly implemented.

SvenDowideit commented 6 years ago

Best thing for me, is that it means I can attest (and sign) that my app only does / requires a small set of abilities - and any time code is merged that changes this, we can discuss if its necessary...

yes! make us more accountable :)

xfernando commented 6 years ago

Hey @jessfraz, I saw your keynote at FOSDEM and remembered seeing this issue here, so I did a quick hack that generates a seccomp profile for go binaries (same limitations you mentioned on your talk apply).

It's at github.com/xfernando/go2seccomp. :)

jessfraz commented 6 years ago

This is really dope, thanks @xfernando

Edznux commented 6 years ago

I'm very interested in helping on this, in particular on the "option 2" that @jessfraz pointed out! Can I get in touch? You mentioned slack?

I have done a very small POC with angr to do binary analysis but there is probably a better way to do it... It's painfully slow due to path explosion and not that much accurate at the moment...

stealthybox commented 6 years ago

Looks like @xfernando's code implements Option 2. One other interesting note from @jessfraz's is that since this uses go tool, you need access to the source code. This works fine for metaparticle's use-case.

Generating the same filters with strace would allow you to diff between the pre-analyzed and runtime versions -- could be useful for the edge-cases.

Edznux commented 6 years ago

@xfernando's code works for Go binaries. This issue was specificaly written for Go support but a more generic solution may be interresting? (if it's even possible)

Maybe this is out of scope here.

xfernando commented 6 years ago

A generic solution that can analyze any ELF binary can certainly be constructed. I considered trying to do this but since I've never read about the ELF format before. After doing some research, it seemed like it would take a lot longer to get something ready so I went for the easier way and doing it only for go programs.

It's harder because you'd have to trace where the syscall trap ID is coming from. This is easy in go because they provide a nice interface that allowed me to check for calls to functions in this interface and get the syscall ID passed to them.

But on programs that do not have conventions for how syscalls are made I guess it will be much harder.

If you look at the code for the Syscall funcion you'll see that it puts the value at the address pointed by the frame pointer register:

    MOVQ    trap+0(FP), AX  // syscall entry
    SYSCALL

A generic solution would have to go back through all possible call sites of the current function and figure out which value is being pushed on the call stack in order to get the syscall ID.

The best case scenario (which is the probably the most common one) is that you track the syscall ID and it is a constant. But if it's calculated at runtime I'm pretty sure it's impossible to get its value (it should be easy to convert this to the halting problem).

Edit: actually best case scenario would be finding this:

MOVQ   SYSCALL_ID,AX
SYSCALL

where SYSCALL_ID is a constant value.