robert-w-gries / rxinu

Rust implementation of Xinu educational operating system
Apache License 2.0
33 stars 4 forks source link

Random deadlocks caused by PIT interrupt during allocation #48

Closed robert-w-gries closed 6 years ago

robert-w-gries commented 6 years ago

Problem

Now that the timer IRQ handler is calling resched(), any code that is not wrapped in a call to the interrupts::disable_then_restore() function can be interrupted to schedule a new process.

Since we do not support preemption in our kernel yet, we essentially wrap all scheduling code in disable_then_restore(). This prevents our kernel from being interrupted while holding important locks, such as the PIC locks or the process table locks.

However, there is still one remaining issue with deadlocks left. Currently, when processes use the allocator API to create structures like Vec or String, there is a chance that the kernel can interrupt the linked-list-allocator while it is holding the Heap lock.

We need to find a way to disable interrupts before using the allocator API, preferably without vendoring the linked-list-allocator code in this repo.

Possible Solutions

Naive solution

We can wrap every allocator call in a process with disable_then_restore(). This is impractical and poor design.

Use syscall for all memory allocation

I don't know if this will be feasible since the allocator api is invoked automatically by creating a Vec or String. It seems possible but undesirable since Vec and String creation would need to be wrapped to use the syscall.

Wrap the linked-list-allocator

We might be able to wrap the linked-list-allocator with our own allocator that just disables interrupts then calls the allocate/deallocate methods in linked-list-allocator. I think this is our best path forward.

Debugging

Call #4 in the call stack below is where we trigger the deadlock in linked-list-allocator. You can tell it's a deadlock because interrupts stop firing and the same lock check keeps getting executed in a loop.

#4  0x0000000000131ac4 in linked_list_allocator::{{impl}}::dealloc (self=0x40004090, 
    ptr=0x400022d8 "\000", layout=...)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/linked_list_allocator-0.4.3/src/lib.rs:176
    unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) {
        self.0.lock().deallocate(ptr, layout)
    }
Click to expand debug info ``` (gdb) list 1486 1487 #[inline] 1488 unsafe fn atomic_load(dst: *const T, order: Ordering) -> T { 1489 match order { 1490 Acquire => intrinsics::atomic_load_acq(dst), 1491 Relaxed => intrinsics::atomic_load_relaxed(dst), 1492 SeqCst => intrinsics::atomic_load(dst), 1493 Release => panic!("there is no such thing as a release load"), 1494 AcqRel => panic!("there is no such thing as an acquire/release load"), 1495 __Nonexhaustive => panic!("invalid memory ordering"), (gdb) info stack #0 core::sync::atomic::atomic_load (dst=0x147018 "\001\000", order=core::sync::atomic::Ordering::Relaxed) at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/sync/atomic.rs:1491 #1 0x000000000014581c in core::sync::atomic::AtomicBool::load ( self=0x147018 , order=core::sync::atomic::Ordering::Relaxed) at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/sync/atomic.rs:316 #2 0x0000000000132d5f in spin::mutex::Mutex::obtain_lock (self=0x147018 ) at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/spin-0.4.6/src/mutex.rs:167 #3 0x0000000000132d95 in spin::mutex::Mutex::lock (self=0x147018 ) at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/spin-0.4.6/src/mutex.rs:191 #4 0x0000000000131ac4 in linked_list_allocator::{{impl}}::dealloc (self=0x40004090, ptr=0x400022d8 "\000", layout=...) at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/linked_list_allocator-0.4.3/src/lib.rs:176 #5 0x0000000000129964 in rxinu::__rg_allocator_abi::__rg_dealloc (arg0=0x400022d8 "\000", arg1=8192, arg2=8) at src/lib.rs:123 #6 0x000000000011456e in alloc::heap::{{impl}}::dealloc (self=0x40001b70, ptr=0x400022d8 "\000", layout=...) at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/heap.rs:104 #7 0x0000000000111e70 in alloc::raw_vec::RawVec::dealloc_buffer (self=0x40001b70) at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/raw_vec.rs:687 #8 0x0000000000112f05 in alloc::raw_vec::{{impl}}::drop (self=0x40001b70) at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/raw_vec.rs:696 #9 0x0000000000141595 in core::ptr::drop_in_place> () at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr.rs:59 #10 0x00000000001411cf in core::ptr::drop_in_place> () ---Type to continue, or q to quit--- at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr.rs:59 #11 0x0000000000141b44 in core::ptr::drop_in_place>> () at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr.rs:59 #12 0x0000000000121095 in rxinu::scheduling::cooperative_scheduler::{{impl}}::kill ( self=0x148bc8 <::deref::__stability::LAZY+8>, id=...) at src/scheduling/cooperative_scheduler.rs:84 #13 0x0000000000119703 in rxinu::scheduling::process::process_ret () at src/scheduling/process.rs:108 #14 0x0000000000148bc8 in ::deref::__stability::LAZY () #15 0x00000000001489f0 in ?? () #16 0x0000000000000001 in ?? () #17 0x0000000000000001 in ?? () #18 0x0000000000000000 in ?? () ```