Orycterope commented 5 years ago

We really need a UserspacePtr that checks address can be deref'd.

102

Here's my first attempt at an implementation, please give me your comments:

Jobs

The UserspacePtr is expected to fulfill the following goals:

Check that a given address is ok to derefence.
Check that it won't be unmapped while we have a reference to it.

We also choose to check user permissions. We define the rules as "we do only what the process could have done himself":

For UserspacePtr we read from: USER_ACCESSIBLE | READABLE
For UserspacePtr we write to: USER_ACCESSIBLE | WRITABLE

This means we'll never output in a read only mapping.

We might want to use a QueryMemory to get the mapping.

Limitations

You cannot make a UserspacePtr pointing to the address space of another process. Our UserspacePtr just derefs address, this will only work if address is in our cr3.

To address a value in another address space, you need to create a CrossProcessMapping, and might consider add some additional checks for user accessible and readable-writable

Pseudo code

/// Pointer to a userspace value.
/// 
/// Can be created mindlessly, check happens at deref time. 
///
/// Can only represent pointers in the **current** address space.
pub struct UserspacePtr<T: Sized> {
    address: VirtualAddress,

    // marker for T
    phantom: PhantomData<T>
}

impl<T> UsersapacePtr<T> {

    /// Gets a reference to the pointed value.
    /// 
    /// Checks that `self.address` is valid and is mapped in the current process.
    ///
    /// Returns a reference whose lifetime is tied to the lifetime of the lock, this way
    /// we can guarantee that the mapping in which `address` falls will not be unmapped 
    /// while we're holding a reference to it. 
    /// A side effect is that we can't modify the ProcessMemory while we hold a deref'ed UserspacePtr,
    /// but I don't see it causing any problem.
     pub fn<'lock> deref(&self, lock: &'lock MutexGuard<ProcessMemory>) -> Result<&'lock T, KernelError> {

        // Check address + size_of<T> is in UserLand
        // Check that if falls in a mapping at least USER_ACCESSIBLE | READABLE
        // Check size and alignment of * T
        // Check that `lock` corresponds to the current address space, if it isn't we can't simply make `address` a pointer, the kernel would need to do a CrossProcessMapping.

        // As we're going to interpret T, maybe it should be POD ?
    }

    /// Gets a mutable reference to the pointed value.
    ///
    /// Same as `deref`, but mut.
    /// 
    /// We choose to only write to mappings that are user writable, i.e. that the user could
    /// have written to himself.
    pub fn<'lock> deref_mut(&self, lock: &'lock MutexGuard<ProcessMemory>) -> Result<&'lock T, KernelError> {

        // Same as deref
        // Checks that if falls in a mapping at least USER_ACCESSIBLE | WRITABLE 

        // lift the restriction on POD ? But we're still constructing a ref to it ...
    }
}

roblabla commented 5 years ago

bikeshed time:

The functions should be get and get_mut.
T should definitely be a POD.
Instead of a MutexGuard, we should just take a &ProcessMemory. Makes it easier to setup testing and change lock kind. That remark holds true throughout the codebase: I think we should avoid taking MutexGuard unless we need to precisely control the dropping behavior.

roblabla commented 5 years ago

Related; #102

Orycterope commented 5 years ago

I'm worried about having a &/&mut in kernel on a memory region that can be modified by another user thread. Should we use UnsafeCell somehow, to prevent compiler optimisations ? (Same reasoning as Mmio #202 )

Orycterope commented 5 years ago

Note that deref_mut() (or get_mut() as roblabla suggests) takes only a const reference to the ProcessMemory.

This is ok since we only check the bookkeeping, and we choose it to be allowed since the ProcessMemory semantically does not track the content of the memory, only the layout of its mappings.

This means that we can have several UserspacePtr mutably derefed at the same time, or a const source and a mutable destination. This is great for most use cases.

However no effort is made to check that they do not overlap, and this might become a problem if we hold several references to the same memory zone.

roblabla commented 5 years ago

Actually, get should probably return T and not &T ? This would allow two things:

We could use read_volatile(), avoiding optimization problems
We could have a special version for ARM using ldtr instead of manually querying the page tables.

UserSpacePtrMut would obviously have a set.

roblabla commented 5 years ago

So right now, we use UserSpacePtr almost everywhere HOS does. There's one place where we use UserSpacePtr even though we really shouldn't though: The address of the IPC command buffer. The kernel is supposed to remap it in kernelspace in order to access it.

IMO we should probably just reserve two pages of kernel VMEM that IPC command buffers will be remapped to when doing the copying. Then everything else can use UserSpacePtr.

UserSpacePtr should have a get (and set for UserSpacePtrMut) which does a ptr::read() (and a ptr::write()) after ensuring the address is within the process' Address Space. It should not have a Deref, we should never be dereferencing userspace memory. This is especially important since we'll want to enable SMEP and SMAP (prevents execution and access of userspace memory from the kernel), which requires we centralize all access to userspace memory in a single function (which will take care of temporarily disabling SMAP).

roblabla commented 5 years ago

So here's the thing: We want a copy_from_user function whose goal is to safely copy bytes from a user-provided address (pointing to the user address-space) into the kernel address-space. If the user-provided address points to unmapped memory, non-readable memory, kernel-memory, or anything else that's potentially invalid, it should return an error.

On ARM, we have a function that basically does just that, which HOS uses. It drops its privileges by using ldtr, which will execute an ldr as if it was ran from EL0. The HOS exception handler checks if the instruction that faulted is within one of the "user to kernel/kernel to user" copy routines, and simply erets to the calling function (which is stored in X30, thankfully) with X0=0.

On x86, the situation is a lot more complicated. We have no ldtr instruction. "Temporarily" dropping privileges is not doable. Furthermore, we don't have access to the parent function easily. So the problem is, how do you prevent faults if the user passes an address that's just unmapped for instance?

Now, we could do something very simple like walking the TLB to figure out if the access is OK before doing it (Asking for permission instead of asking for forgiveness). In the simple, single-processor no preemption world that SunriseOS currently live in, that would work fine, but it'd be a very brittle mechanism. But it's slow, and would probably be a bit more complicated in an SMP-enabled world.

Here's what linux does: It has a special section, called __ex_table, that contains ranges of kernel addresses and an exception handler associated with that range. When the pagefault handler executes, it first checks if the faulting IP is contained within these ranges, and if it does, it will iret to the exception handler specified. here's a more in-depth look.

So copy_from_user has basically two asm stubs, the "normal" paths just does the copy, and a separate "failure handling" subroutine that just reports the error. And the pagefault will iret to the failure handling subroutine. This should be performant, especially in the happy path.

I dislike the linux solution, so I went on a quest to see if I could do something better, something that didn't go through the exception handler. I've found something that could work, but relies on extensions that aren't present in most processors, the Intel Transactional Synchronization Extensions (TSX), and in particular the Restricted Transactional Memory (RTM) extension. Those extensions are absolutely not meant for what I'm going to show here, but it should, in theory, work if they are present.

RTM provides two new instructions of interest: XBEGIN and XEND. The idea is to make arbitrary memory accesses "atomic" - XBEGIN has a "failure subroutine" opcode executed when the transaction fails. A transaction can fail for a wide variety of reasons, like if another processor accesses the memory we were writing in our transaction. However, there is one reason that's very interesting to us: A transaction will fail if we attempt accessing unmapped or otherwise unavailable memory. We can effectively bypass the Page Fault handler and handle it entirely in userspace!

A copy_from_user would look like this:

mov esi, $0 // Move source
mov edi, $1 // Move dest
mov ecx, $2 // Move byte count

xbegin failed
rep movsb
xend
failed: ret

Upon returning from this function, if eax == 0, the copy from user succeeded.

This is a super elegant solution, but the TSX instructions aren't widely available, so relying on it for such a critical aspect of the OS isn't a great idea. In particular, neither my desktop nor my server have the RTM feature enabled.

So we're back to square one. Still no good solution available. I'm currently swinging towards doing something similar to what linux is doing. In theory, we only need two such "special" functions that can be seen by the fault handler: a copy_from_user and a copy_to_user. Having special asm stub for them, that the fault handler detects, and returns to a special handler somehow should be doable.

sunriseos / SunriseOS

RFC UserspacePtr #231

102

Jobs

Limitations

Pseudo code