sunriseos / SunriseOS

Horizon/NX kernel reimplementation
Apache License 2.0
232 stars 13 forks source link

SpinLockIRQ is totally broken #190

Open roblabla opened 5 years ago

roblabla commented 5 years ago

SpinLockIRQ doesn't fulfill its intended purpose. In the interrupt handler, when we drop the handler, we end up reactivating interruptions before returning. This leads to a kernel stackoverflow. We should revert to the previous "save and restore" design. However, something needs to be done to prevent ordering problems.

The reordering problem

The traditional way to handle SpinLockIRQ is to have lock() backup the EFLAGS (in particular, the IF flag, which prevents interruptions), and then disable interrupts. Unlock restores IF. This works fine, so long as the locks are unlocked in reverse order as they were locked. Ordering them wrong will lead to terrible consequences. In particular, the first lock should be the last unlock. Failure to do this will result in the EFLAGS being left in an invalid state.

Potential solutions

In our case, lock() returns a Guard, and unlock() occurs when the Guard is dropped. The reordering problem occurs if we move the guard to a subfunction, or manually drop it. In both cases, this would require a move.

Rust has a new API, called Pin, that is made to prevent moves: https://doc.rust-lang.org/std/pin/struct.Pin.html . Perhaps we can use this to prevent the guards from being moved around, and as such ensure the correct drop order is respected?

Orycterope commented 5 years ago

In the process_switch() function I'm doing a bit of magic with the locks, passing them around from the arc-generic scheduler to process_switch(). We should look at this function and see if it can still work when we use Pin.

roblabla commented 5 years ago

Talking to people who know better about this, Pin won't work. Pin prevents moving the value to a new memory location, but the variable can still be passed around, dropped early, etc...

Another solution is to use callbacks.

a.with_lock(||
  b.with_lock(||
    do_stuff()
  )
)

This would prevent the problem entirely.