zephyriot / zep-jira14

0 stars 0 forks source link

Validation mechanism for user-supplied kernel object pointers #2025

Closed nashif closed 6 years ago

nashif commented 7 years ago

Reported by Andrew Boie:

Since we are using the same kernel APIs for userspace, we are going to need to validate any kernel object pointers passed in via system calls. When userspace passes in a pointer to a kernel object, we need to enforce that pointer is valid, lives in a region of memory controlled by the kernel, and actually corresponds to the requested kernel object type.

Brief summary, from an email thread on the subject: {quote} We want to have the same kernel APIs with and without memory protection. This will mean that kernel APIs called from userland will still take pointers to kernel objects as arguments even though the kernel objects live in memory private to the kernel. When userspace passes in a kernel object pointer, we will need to verify that it indeed points to a kernel object of the expected type.

So we have 2 cases of pointers from userspace: pointers to buffers, and pointers to kernel objects. I think we need safe_memcpy for the latter case.

The proposed method (credit to Inaky) IIRC is as follows. Any given kernel object (let's use struct k_sem as an example), with memory protection turned on, will always have as its first member a struct:

struct kernel_object { uint32_t encrypted_ptr; ... other metadata as needed };

struct k_sem {

ifdef CONFIG_MEMORY_PROTECTION

            struct kernel_object ko;

endif

            ... regular struct k_sem members

};

At boot for every kernel object type, the kernel will randomly generate some XOR keys, stored in memory only visible to the kernel. When a kernel object is created, the pointer value of that object is XOR'd with that key and the encrypted value stored in encrypted_ptr. When userspace passes the kernel an object with address A:

1) Validate that the pointer can be safely dereferenced by ensuring that it falls within the RAM range reserved for kernel objects. 2) Knowing that A points to 4 bytes of memory that we can read, the value of A XOR'd with the encryption key for k_sem objects should equal ((struct kernel_object *)A)->encrypted_ptr. This will prove that A is a valid instance of struct k_sem. {quote}

(Imported from Jira ZEP-2187)

nashif commented 7 years ago

by Anas Nashif:

nashif commented 7 years ago

by Anas Nashif:

nashif commented 7 years ago

by Andrew Boie:

I started to look into an implementation of this, this could even be used by itself outside of memory protection if there is suspicion that garbage or uninitialized pointers are being passed to kernel objects.

I've found a problem though. This mechanism depends on run-time initialization of kernel objects, since any kernel object will need its encrypted pointer value stored. For example:

enum k_objects {
    K_OBJ_THREAD,
    K_OBJ_MUTEX,
    K_OBJ_SEM,
    K_OBJ_ALERT,
    K_OBJ_MSGQ,
    K_OBJ_MBOX,
    K_OBJ_PIPE,
    K_OBJ_QUEUE,
    K_OBJ_LIFO,
    K_OBJ_STACK,
    K_OBJ_MEM_SLAB,
    K_OBJ_MEM_POOL,
    K_OBJ_TIMER,
    K_OBJ_POLL_EVENT,
    K_OBJ_POLL_SIGNAL,

    K_OBJ_LAST
};

struct k_object {
    u32_t enc_ptr;
};

extern void _k_object_validate(void *obj, enum k_objects otype);
extern void _k_object_init(void *obj, enum k_objects otype);

Any kernel object will have struct k_object as first member so you can just do a cast:

struct k_sem {
    struct k_object obj;
    _wait_q_t wait_q;
    unsigned int count;
    unsigned int limit;
    _POLL_EVENT;

    _OBJECT_TRACING_NEXT_PTR(k_sem);
};

This is fine for runtime initialization, you just stick a _k_object_init() call in the init function. However, for almost all kernel objects we also have static initializer macros:

#define K_SEM_INITIALIZER(obj, initial_count, count_limit) \
    { \
    .wait_q = SYS_DLIST_STATIC_INIT(&obj.wait_q), \
    .count = initial_count, \
    .limit = count_limit, \
    _POLL_EVENT_OBJ_INIT \
    _OBJECT_TRACING_INIT \
    }

It doesn't work for this case, k_sem_init() is never called. In addition, these initializers can be embedded within other kernel objects, for an example a k_alert has a k_sem inside it:

#define K_ALERT_INITIALIZER(obj, alert_handler, max_num_pending_alerts) \
    { \
    .handler = (k_alert_handler_t)alert_handler, \
    .send_count = ATOMIC_INIT(0), \
    .work_item = K_WORK_INITIALIZER(_alert_deliver), \
    .sem = K_SEM_INITIALIZER(obj.sem, 0, max_num_pending_alerts), \
    _OBJECT_TRACING_INIT \
    }

I need to figure out some kind of preprocessor voodoo such that all objects of a particular type initialized this way, even ones embedded within other objects, will get their pointer values stuck in a special section that can be iterated over at boot to set the encrypted pointer value.

this would be a lot easier to do if K_INITIALIZER macros were private to the kernel and not public. If we can do that, and enforce that only K_DEFINE() macros are public, then this is easy, the data structures already get put in a special section that can be iterated over.

nashif commented 7 years ago

by Inaky Perez-Gonzalez:

Hmm, good point

This might be a security issue if we do it at link time, as if we know the location and the crypted key we could guess the cookie.

We could, however, at link time, record in a section the list of pointers that have to be crypted and with which type cookie and at kernel initialization time, go over that table, generate the cookies and then free up the table.

Would this work?

nashif commented 7 years ago

by Andrew Boie:

{quote} This might be a security issue if we do it at link time, as if we know the location and the crypted key we could guess the cookie. {quote} I was pursuing an approach where these keys were randomly generated at boot. I guess we could look into generating them at build time, but you could unfortunately just read them out of flash.

{quote} We could, however, at link time, record in a section the list of pointers that have to be crypted and with which type cookie and at kernel initialization time, go over that table, generate the cookies and then free up the table. {quote}

I'm planning on doing that. The trick is that it's easy if people use K_DEFINE(). I already have some code which does this. I think it's impossible to know about those pointers if people use K_INITIALIZER() for embedded kernel objects. I talked to Benjamin Walsh and it was not intentional for K_**_INITIALIZER() to be public. So I'm going to deprecate them and make private APIs.

nashif commented 7 years ago

by Inaky Perez-Gonzalez:

Oh yes, I agree, they have to be randomly generated at boot--sorry I made it confusing.

Unless there is a need to keep K_*_INITIALIZER() around, seems a very sensible solution to me.

nashif commented 7 years ago

by David Brown:

I think I'm ok with using this XOR'd marker to validate objects, but we really shouldn't use the terms "encrypted". This isn't an encryption in any sense. Perhaps "token" or something as just a validator of the object.

nashif commented 7 years ago

by David Brown:

Also, 32-bits isn't all that big and may not offer all that much protection against rogue pointers passed in.

nashif commented 7 years ago

by Andrew Boie:

{quote}Perhaps "token" or something as just a validator of the object.{quote}

Fine with me.

{quote}Also, 32-bits isn't all that big and may not offer all that much protection against rogue pointers passed in.{quote}

You lost me here. A bad pointer, to pass this check, would have to have the first 4 bytes of memory that it points to XOR'd with the validator and the bad pointer value be the result. Considering that these bad pointers will also be bounds checked to only be within the kernel's memory area and the odds of this happening by accident seem astronomical to me.

nashif commented 7 years ago

by David Brown:

bq. You lost me here. A bad pointer, to pass this check, would have to have the first 4 bytes of memory that it points to XOR'd with the validator and the bad pointer value be the result. Considering that these bad pointers will also be bounds checked to only be within the kernel's memory area and the odds of this happening by accident seem astronomical to me.

As a way of detecting bugs and such, 4 bytes would be perfectly adequate. I wouldn't consider this adequate for security, since that is a reasonable number for a malicious agent to retry.

nashif commented 7 years ago

by Andrew Boie:

David Brown that's reasonable, thanks

nashif commented 7 years ago

by Inaky Perez-Gonzalez:

While 4G tries on an MCU might take quite a while, the risk is there. However, adding that:

what is your concern here?

Is the concern that process A being able to find process' B pointer? It'd have to try a max of 2^64 combinations (assuming byte alignment) and then multiply that by the combinations of type key (so another 2^32). I think hitting one out of 2^96 will impose a good protection in there (of course, this is assuming proper RNG).

As well, as Andrew mentioned, there will be bounds checks, so we know we are not being directed to a hyperspace location for DoS.

What other concerns do you have? I'm curious to make sure we close all the possible holes or have rationales for them.

nashif commented 7 years ago

by David Brown:

I think my main concern is that we're effectively trying to do to much, and will end up moving ourselves away from a microcontroller OS. I think in light of both a debugging use, as well as a protection against some malicious uses, the 32-bit value will be fine. It would not be adequate against a platform where untrusted users are able to run arbitrary code. But, in our environment, the code is generally controlled, and this will reasonably protect these structures.

At least as long as we don't try to call it encryption.

nashif commented 7 years ago

by Andrew Boie:

{quote}I think my main concern is that we're effectively trying to do to much, and will end up moving ourselves away from a microcontroller OS{quote}

David Brown can you please elaborate here? The end-state goal we are pursuing is to augment Zephyr with optional functionality in the model of FreeRTOS-MPU, without significantly changing the kernel APIs or getting in the way of users that don't want/need thread protection. We want to introduce unprivileged threads that can't clobber other threads' stacks or corrupt the kernel itself. We want this to be something that can be turned on or off, not a required part of the kernel. Do you believe FreeRTOS-MPU is doing too much? How are we moving ourselves away from being a microcontroller OS? If you think what FreeRTOS-MPU is doing is a bad idea I don't think there's much opportunity for consensus here but I would at least like to know what specifically in the plans that have been made are concerning to you..

nashif commented 7 years ago

by David Brown:

I think looking at FreeRTOS-MPU is useful. I just fear we are implementing a bunch of stuff, and I'm not sure we know what our threat model even is. It seems to be easy to say "We have an MPU/MMU, what can we do with it", rather than figuring out what our use cases are, what threads we have, prioritize those threats. This is important as we have significantly limited resources here, and Zephyr wants to work across a fairly diverse range of device capabilities (from something with 16K of RAM, and 8 MPU slots, to things much more capable).

Having a kernel/user-space separation is a fairly drastic step, doesn't fit particularly well with the current APIs, and as I understand, early in Zephyr's life was an explicit decision to not support.

For example, it is perfectly reasonable to support an MPU/MMU with everything running in privileged mode. If our threats are against accidental errors in code, and things like errors resulting in rogue reads that could leak data, we can still protect against a lot, even though truly malicious code could just reprogram the MPU/MMU, a lot won't.

I'm not saying we shouldn't separate threads, or any of the things discussed, but we need to understand our use cases, our what threats we have, and weigh them against the costs of implementing these features.

nashif commented 7 years ago

by Andrew Boie:

David Brown , unless I'm greatly misunderstanding something, nothing planned here will prevent Zephyr from continuing to run on the lowest-end devices. They may not be able to use all the new features they are bringing in but we are not saying good-bye to any class of device with this effort. I would also contend that you are underestimating how much we have considered our needs for this feature, maybe you weren't part of the discussion or didn't see the mailing list threads, I don't know.

The threat we (Intel) are primarily interested in is to make it easier for developers to write complex multi-threaded applications and be able to debug them more effectively. Security for untrusted code on the system is a secondary goal, which we would like to also support well, but right now the outside requests we are getting are all to have thread protection, with FreeRTOS-MPU's design specifically called out as an example.

As I discussed on the mailing list, this effort has several layers which can be supported on devices depending on their capabilities, in increasing order of complexity:

1) Boot-time configuration of memory regions. We have this now on ARM and x86. Produce a CPU exception if we read/write pointers that don't map to real memory, try to execute RAM that doesn't have code, etc etc. The memory policy configuration is fixed at boot and doesn't change.

2) Simple stack protection. Throw an exception if a thread exceeds its stack space. Massively useful for debugging, normally when stack overflows you get all kinds of weird and unpredictable behavior. Requires some simple runtime reconfiguration of memory policy on context switch.

3) Thread protection. Introduce user vs supervisor threads. User threads can't touch any stack but theirs, and can't touch kernel memory. System calls to do privilege elevation. Reconfiguration of memory regions on context switch. For debugging complex multi-threaded applications that would otherwise might stomp on each other's memory or crash the kernel itself.

4) "Secure" thread protection. Run untrusted code sandboxed in a user thread and have full confidence that it will not be able to attack the rest of the system. The technique described in this JIRA (GH-2025) would not be appropriate for this use-case, but we can drop in an alternative implementation (that might be slower or involve indirection handles) to support it.

5) Full virtual memory. Use MMU to implement processes with their own virtual memory. We don't have plans to do this at this time.

Right now the focus on our side for this effort is on No. 3. No. 4 is desirable as an iterative refinement.

{quote}doesn't fit particularly well with the current APIs{quote} Please, be specific. For example, I sent a mail earlier this week discussing ISR callbacks (I don't think we really have a problem here). Can you at least reply to that? Other than defining some allocators for kernel objects the general feeling we have is that we can leave the existing kernel APIs alone, this is not going to be a re-write and users who are uninterested in this feature should not be impacted.

{quote} and as I understand, early in Zephyr's life was an explicit decision to not support. {quote} Zephyr (in its previous branding as Viper OS) had full virtualized userspace for a long time. If you look at the very first commit in the git history I believe the MMU enabling code is there. It was removed because full MMU virtualization was felt to be too heavyweight. However we have a lot of people clamoring for some method of not having threads stomp on each other.

nashif commented 7 years ago

by Anas Nashif:

David Brown Not sure you were present when Alex presented the outcome of the kernel dive-in we had. The outcome was captured in this slide:

!screenshot-1.png|thumbnail!

from this original slide set.

[^2017-01-12_1_Zephyr_Kernel_assets.pptx]

Thread isolation is the bare minimum we need to be able to call a system secure. I know Ruud Derwig was present in this discussion at least. So I mostly agree with Andrew Boie here, I still have an issue calling this a debugging feature, given the amount of work and resources we are investing making this happen ;-)

nashif commented 7 years ago

by Andrew Boie:

I'm still debugging and enabling validation for various kernel objects, but a sneak preview of this work can be seen here:

https://github.com/andrewboie/zephyr/tree/kobject

Certain network stack data structures have kernel objects embedded within them, I need to figure out the best way to deal with those.

nashif commented 7 years ago

by Andrew Boie:

https://github.com/zephyrproject-rtos/zephyr/pull/834

nashif commented 7 years ago

by Andrew Boie:

Taking a different approach than the XOR method. I have an RFC drafted and will present to the memory protection working group tomorrow morning, and a revision after that discussion will be posted here.

nashif commented 7 years ago

by Andrew Boie:

Problem statement:

There should be no way for userspace threads, either through mistakes or malice, to corrupt the Zephyr kernel itself via misuse of kernel objects, such that the consequences of this misuse extend past the threads directly involved in the use of that kernel object.

Terminology:

When we speak of "kernel objects" here, this includes all the typical kernel objects defined in include/kernel.h that require a privilege elevation to interact with them, plus all driver 'struct device *' instances fetched either directly or via device_get_binding(). The validation of drivers will be done at the subsystem level, we want to touch the actual drivers themselves as little as possible.

Design goals:

We would like to have the same APIs for both userspace and kernel space, which would mean that userspace will be passing pointers to the kernel objects as part of making kernel APIs calls.

The scope of this proposal is to show how we intend, once we do the privilege elevation from user to supervisor mode, that the kernel object pointers provided in an API call are:

We are NOT trying to prevent against malfeasance in supervisor mode, by definition you cannot, although these validation features can be helpful to catch bugs for supervisor mode threads.

We are only trying to limit the scope of kernel object misuse, not catch all potential issues. If user thread A holds a semaphore and then crashes, user thread B waiting on that semaphore will still wait forever. But the effect is limited to just thread A and B, the kernel does not explode, other threads not involved with that semaphore are unaffected.

TLDR proposal:

At build time, parse the ELF binary to find all the kernel objects declared in the system and build up a list of their memory addresses. Use this list to construct a build-time perfect hashtable, such that if we want to test at runtime whether a particular memory address is indeed a kernel object, this hash table will confirm or deny this in O(1) time. The hash table will map to a compact array of object metadata which will, for each entry, indicate the thread permissions, type, and initialization state of the object address that maps to it.

The constraint of this approach is that at minimum for objects referenced by user threads, all these objects need to be declared at build-time, it will not be allowed to dynamically create objects on a kernel stack or heap. Augmented APIs for kernel object pools will be provided for dynamic object use-cases.

For objects referenced by supervisor threads, heap allocation can be allowed if either the validation mechanism is shut off for supervisor threads, or we create a supplemental runtime hash table for these dynamic objects with the understanding that lookup performance may not be O(1). Neither of these will be enabled by default, and users should know exactly what they are doing as many of the protections given by this mechanism are defeated by these.

Permission tracking of kernel objects will be done with bitfields, imposing a maximum number of threads in the system, tunable via Kconfig. Permission bits are only enforced by user threads.

Implementation Details

Creating the perfect hash table

We need to know where all the kernel objects are. These come in several flavors:

We can find all of these using the DWARF debugging data inside the ELF binaries created by the build. Using the Python elftools library, we can scan the entire kernel, find all instances of kernel objects, and build up a list of their memory addresses, type, and whether they have been statically initialized, and do some sanity checking to make sure that these are, for instance, actual kernel objects and not from some alternate definition with the same name.

This information will then be used to create a perfect hashtable:

https://en.wikipedia.org/wiki/Perfect_hash_function

Creating this table will be easy, we can just use GNU gperf to do it:

https://www.gnu.org/software/gperf/manual/gperf.html

This emits some C code which we can build and link into the kernel. Our build system already has a notion of a 2-pass build, we create a 'zephyr_prebuilt.elf' first and feed that to various build-time tools which create data structures that end up in the final binary. We do this for creating the MMU page tables, IDT, and GDT on x86, and also to create the IRQ vector tables for other architectures. Care must be taken that when the kernel is re-linked with the code/data from the gperf output, that this does not shift the location of any previsouly existing code/data in the system.

Object Metadata and Permissions

The hash function created by gperf will only tell us that a particular memory address is valid for a kernel object, but says nothing about that object's type, permissions, or initialization state. We are initially going to create an array of:

struct k_object __packed { char perms[CONFIG_NUM_THREAD_BITS]; / Default 2, max of 16 threads / char type; /* Some value in an enumeration of all kernel objects

We will need to implement this array such that given the address of a kernel object, we can fetch the index in this array in constant time using the hash function provided by gperf.

With a default of max 16 threads, this whole thing will fit in a u32_t. There will be a 1:1 mapping between instances of struct k_object and all the kernel objects instantiated at build time in the system. This data structure will be created at build time using a post-build step just like the gperf code/data.

The permissions field will be initially zero. This is a bitfield; when threads are created they will be assigned an index in this bitfield. Threads running in supervisor mode can access any kernel object they want.. Threads running in user mode will be restricted:

The API call to grant permission would be something like:

k_object_grant_access(void object, struct k_thread thread);

Validation will be done to ensure that the 'object' parameter actually points to a valid instance of a kernel object, and that if the caller is a user thread, that the caller already had permission to maniuplate it.

System call code flow

Note that in the below caae, "fail" will typically mean "throw an exception which kills the calling thread" although there might be some scenarios where we set errno and return, case-by-case basis.

It's also worth noting that not all kernel APIs will be exposed to userspace in the system call layer. For example, APIs which register callbacks that run in interrupt or supervisor thread context will be supervisor-only.

The code flow, for a userspace thread making an API call, would look roughly as follows:

user code: ... k_sem_give(&my_sem); ...

What happens next is performed by the system call stubs, which are going to be auto-generated via preprocessor/linker/post-build black magic:

1) The &my_sem parameter is marshaled through the system call interface and a privilege elevation is done via arch-specific mechanism to get execution context in kernel mode at a designated entry point, typically via software interrupt or special instruction like SVC, SYSENTER, etc.

2) The &my_sem is looked up in the hash table. A negative result indicates that this isn't a kernel object and we fail.

3) We get a pointer to the index in the k_object metadata array. We check that the object type matches the type of system call we were making, if &my_sem actually pointed to a struct k_work (for example) and not a struct k_sem, we fail.

4) We check the initialization bit in the metadata. Unless this was an init call, and the bit is unset, we fail.

5) We get the ID of the calling thread and check that the corresponding bit in the perms field is lit up.

6) If the caller provides additional buffers for data exchange (not the case for k_sem, but definitely for other objects and driver APIs), walk the page tables or MPU configuration to ensure that these buffers live in RAM that the calling thread has access to.

For kernel threads making API calls, much or all of this can be skipped, certainly steps 1, 5 ,and 6. We are never going to try to prevent malfeasance by a thread running in supervisor mode. However it can be useful to catch bugs, so we can optionally allow some validation for APIs called from supervisor context and perform steps 2, 3, and 4.

Constraints

Constraints imposed by any protection implementation

Regardless of whether this or some other method is used, it is a hard requirment that user threads may not read/write kernel objects directly, all they can do is pass their address to system calls. Kernel objects contain data that is private to the kernel, and if corrupted could potentially take the entire system down. They must only be manipulated in a very controlled manner.

This means that no matter what, putting kernel objects onto a user thread's stack is forbidden, as that is outside the kernel's memory space. If kernel objects are declared toplevel, and the CONFIG_APPLICATION_MEMORY option is enabled (which grants read/write access to all toplevel globals defined in application object code), either the __kernel decorator must be used or macros like K_SEM_DEFINE() which ensure the object ends up in the right place in RAM.

User-allocated data structures that don't live in kernel memory will not be able to contain kernel objects directly embedded within them, they will need to store pointers to such objects instead. Complex data structures that live in kernel memory and only manipulated by supervisor contexts (such as drivers) are allowed to have embedded kernel objects, the build-time DWARF parsing will be able to find them.

User threads will not be able to allocate kernel objects at runtime on generic kernel memory pools. free() carries no context about the particular type of memory being freed, and its possible to defeat the validation mechanism by doing a system call on a kernel object pointer that was initialized at some point but later freed. If a supervisor thread allocates an object onto a kernel heap and then grants permission to a user thread to access it, that should be considered a security bug.

Additional constraints imposed by this proposal

With this proposal, and without making some potential exceptions (described below), the set of all kernel objects must be known at build time. They can't be allocated on a stack or heap even if that memory is within the kernel's RAM.

To compensate for this, the plan is to augment the existing k_mem_slab APIs to make it very simple for users to reserve pools of kernel objects for dynamic use-cases. Object pools are much easier to deal with, the memory is reserved ahead of time, the DWARF parsing can locate all the kernel objects within them at build time, and this memory could never get overwritten with random data by a user thread.

If, for whatever reason, an application just has to be able to put objects on a kernel heap, there are some options:

Both of these are perilous and should only be enabled as a last resort.

Lifecycle management:

For the initial implementation, very little lifecycle management is planned, but more advanced stuff could be a later enhancment:

Other Comments

For the initial implementation, we are trying to change kernel APIs as little as possible, and even trivial access to kernel objects will require privilege elevation.

In the future, we may try to see if some objects could be implemented without system calls. It would be desirable, for instance, to see if we can implement a k_mutex done in userspace. Once the initial implementation of thread protection is out, there will be a great deal of interest in optimization and reducing the need for privilege elevation wherever possible.

nashif commented 7 years ago

by Andrew Boie:

I have the DWARF parsing working, and can generate the hashtable with gperf.

Now for the fun part: ensuring that including the text/data generated by gperf does not shift the memory addresses that we are hashing.

nashif commented 7 years ago

by Andrew Boie:

This is now available for code review:

https://github.com/zephyrproject-rtos/zephyr/pull/1276