lsds / sgx-lkl

SGX-LKL Library OS for running Linux applications inside of Intel SGX enclaves
MIT License
257 stars 89 forks source link

Remove libc dependencies from src/enclave #151

Open davidchisnall opened 4 years ago

davidchisnall commented 4 years ago

As part of the relayering work, we should ensure that nothing in src/enclave depends on libc init. Some of these changes are simple and mechanical:

Others are more complex.

We can't remove a small set of things that we expect to exist in a freestanding C environment (e.g. memset / memcpy), because the compiler will emit these even if we don't.

Checklist:

The full list of dependencies can be seen by running the following command:

$ cd build_musl/sgxlkl/enclave
$ for I in *.o ; do echo $I ; nm -g  $I | grep ' U ' ; done
davidchisnall commented 4 years ago

@wintersteiger , FYI: please keep this in mind when replacing the config data. I hope that we'll get a lot of the fixes in sgxlkl_app_config.c for free when we move to using JSON and @mikbras's JSON library.

prp commented 4 years ago

Note that we have SGXLKL_ASSERT for assertions.

wintersteiger commented 4 years ago

My work is mostly between OE and LKL, i.e. before LKL starts up. Can I use all of OE's musl at those points in the code or just the *alloc functions?

The new json library also depends on libc (e.g. string functions), but I think we can work around them within reasonable effort.

Is anybody emotionally attached to sgxlkl_app_config_t? It would simplify all of this quite a bit if we just removed it and kept only one config structure (e.g. half-initialized *disk_config_t's being passed around to mount filesystems etc).

prp commented 4 years ago

@wintersteiger conceptually it makes sense to me to have a struct that contains the full app_config, the way it has passed attestation. Or do I misunderstand your proposal?

wintersteiger commented 4 years ago

@prp basically, yes. I started with putting the entire configuration into the json configs, I think there will be very few settings that we don't want attested, so I don't think we'll need a separate config for that purpose. And all of the shared memory should go into a separate struct, not into sgxlkl_enclave.

prp commented 4 years ago

@wintersteiger ok, I see. So the question is if the parsed JSON app_config goes into a single app_config struct or into many separate structs/variables for the different SGX-LKL components.

wintersteiger commented 4 years ago

Yes, I think we should have a single struct that corresponds to the json config, otherwise it will be very tricky for users/reviewers to see what is in fact attested and what is not. I don't think we'll need an additional unattested config, but if it turns out we do, we should keep it separate (as opposed to the current merge of host, enclave, and app configs).

davidchisnall commented 4 years ago

I agree, having a separate attested config makes a lot of sense, I don't have strong opinions as to whether we want attested vs nonattested config or a load of foo_config_attested and bar_config_unattested things.

We don't want to depend on OE's libc, not least because it also pulls in a load of oesyscalls dependencies, which increase our attack surface, but also because (assuming we can make it work at all) we'll end up with two copies of a load of libc things.

I'm happy for us to depend on things that a freestanding C environment would provide.

davidchisnall commented 4 years ago

OE core provides a number of these functions that we can use:

Functions expected in any freestanding C implementation:

memcmp, memcpy, memmove memset.

oe_-prefixed versions of some useful things:

oe_strlen, oe_snprinf, oe_strchr, oe_strchrnul, oe_strcmp, oe_strcspn, oe_strdup, oe_strlcat, oe_strlcpy, oe_strlen, oe_strncmp, oe_strrchr, oe_strspn, oe_strstr, oe_strtok_r, oe_strtoul.

We should add a compatibility header that #defines their non-oe_ versions to call these and -include it when building all of the enclave files.

wintersteiger commented 4 years ago

These are used in the new json parser without an oe_* version: strcpy, strtoll, strtod, strtoull.

Would it simplify things if we built src/enclave without sgxlkl-musl and only with oe-musl?

SeanTAllen commented 4 years ago

malloc to oe_malloc is going to be a little tricky in a couple places. I'm still tracking it down.

When making some updates from calloc to oe_calloc, strdup to oe_strdup, and malloc to oe_malloc, some of the test suite segfaults. I spent a couple hours today looking at it and it's unclear to me at the moment why. I thought I had missed something important where perhaps memory allocated with oe_malloc was being freed using free either inside or outside of the src/enclave tree. However, that doesn't appear to be the case (although its possible I missed something that is non-obvious).

I'm opening a PR for the changes related to malloc/free/calloc/strdup that are definitely contained to src/enclave and don't cause any segfaults. Baby steps to closing the ticket. Baby steps.

SeanTAllen commented 4 years ago

Status update...

This didn't end up where I expected it to. There's a 151_2 branch for this repo, plus a corresponding one for sgx-lkl-musl. It contains the additional calloc, malloc, strdup changes for this issue. However, a couple applications in the test suite (for example dotnet) segfault.

They only segfault in HW mode. Never in SW mode.

The initial thought was "we don't have enough memory and are getting back NULL from a malloc/calloc and that is the issue". The code on the branch contains NULL checks after the new oe_malloc and oe_calloc calls (many of the replaced calls made no such checks). That didn't turn anything up.

The next thought was "we are running out of memory somewhere else because of this change". Given that could happen in a number of locations, I increased various memory limits. You can see these in the branches as well. Same problem.

What exists now on the branches also includes a lot of printf style debugging so we can trace through where things blow up.

Here is what that shows:

I tested this by changing https://github.com/lsds/sgx-lkl/blob/oe_port/src/enclave/enclave_init.c#L209

Setting

libc.user_tls_enabled = sgxlkl_enclave->mode == SW_DEBUG_MODE ? 1 : sgxlkl_enclave->fsgsbase;

so that it is either:

ON:

libc.user_tls_enabled = 1;

OFF:

libc.user_tls_enabled = 0;

results in no segfault when off, and segfault when on.

The printf debugging will show the different code paths that are taken. I've gotten to the point where I know that we get to https://github.com/lsds/sgx-lkl-musl/blob/oe_port/src/env/__init_tls.c#L50 and then segfault.

My working assumption is that assumptions about memory layout related to the app_config that is used for configuration are being made. However, that is just an assumption. There's a lot of nuance in the thread local storage code that is spread out over a number of files that would need to be investigated.

I'm setting aside work on this for now. @davidchisnall believes that when https://github.com/lsds/sgx-lkl/issues/155 is completed, we can do away with the problematic code entirely and replace with a user space implementation of pthreads. If that turns out to be correct then there's not a lot of point in figuring out the specific issue. For that reason, I'm pausing work on the additional malloc/calloc/strdup => oe_malloc/oe_calloc/oe_strdup changes for the time being.

See https://github.com/lsds/sgx-lkl/pull/234/files for additional context on this without having to check out the branches.

wintersteiger commented 4 years ago

(Just in case you haven't come across it before: thread locals may be reinitialized upon ecalls. For some reason this was the default behavior when I last looked at it, but it may not be anymore. It's called the "TCS policy"; see e.g. https://github.com/intel/linux-sgx/issues/283)

davidchisnall commented 4 years ago

@wintersteiger, I believe this is not an issue with the recent OE, which ensures that FS base is preserved across ecalls.

SeanTAllen commented 4 years ago

malloc changes are now passing CI. there was a fix in OE that addressed the problem.

davidchisnall commented 4 years ago

Yay, ticks!

SeanTAllen commented 4 years ago

__assert_fail is done but I won't tick until its merged.

SeanTAllen commented 4 years ago

fprintf/__stderr_FILE is done but I won't tick until its merged

SeanTAllen commented 4 years ago

I'm as assignee so I don't lose track of checking what of this has been done once #604 is merged.

SeanTAllen commented 4 years ago

Updated to reflect changes from #604 and associated work.