robert-w-gries / rxinu

Rust implementation of Xinu educational operating system
Apache License 2.0
34 stars 4 forks source link

Creating Box wrapper for trait object points to itself instead of trait object #50

Closed robert-w-gries closed 5 years ago

robert-w-gries commented 6 years ago

Problem

We see an exception when trying to call scheduler.kill() in process_ret on i686.

In test process!
[try to kill process]

ExceptionStack {
    error_code: 0x0,
    instruction_pointer: 0x14647c,
    code_segment: 0x8,
    cpu_flags: 0x206,
    stack_pointer: 0x40000d3c,
    stack_segment: 0x3
}
InterruptDescription {
    vector: 14,
    mnemonic: "#PF",
    description: "Page Fault",
    irqtype: "Fault",
    source: "Any memory reference."
}

Page fault while accessing 0x61

I tracked down the issue to an incorrect trait object value. The following gdb session shows that we get the proper pointer to SCHEDULER (with some name mangling):

(gdb) list
102     let scheduler_ptr: *mut &DoesScheduling;
103     asm!("pop $0" : "=r"(scheduler_ptr) : : "memory" : "intel", "volatile");
104 
105     let scheduler = Box::from_raw(scheduler_ptr);
106 
107     let curr_id: ProcessId = scheduler.getid();
108     scheduler.kill(curr_id);
109 }
(gdb) n
105     let scheduler = Box::from_raw(scheduler_ptr);
(gdb) p/x scheduler_ptr
$1 = 0x400010f4
(gdb) x 0x400010f4
0x400010f4: 0x0014b360
(gdb) x 0x14b360
0x14b360 <_ZN72_$LT$rxinu..scheduling..SCHEDULER$u20$as$u20$core..ops..deref..Deref$GT$5deref11__stability4LAZY17h402ae1403f065449E+4>: 0x00000001

scheduler_ptr represents a raw pointer to a &DoesScheduling trait object stored during process creation. We want to create a Box<&DoesScheduling> from this raw pointer called scheduler.

In x86_64, scheduler is a Box<&DoesScheduling> that points to our global SCHEDULER as expected. However, in i686, scheduler is a Box<&DoesScheduling> that points to itself.

i686 behavior

(gdb) p/x scheduler_ptr
$1 = 0x400010f4
(gdb) p *scheduler
$3 = rxinu::scheduling::&DoesScheduling {pointer: 0x400010f4 "\364\020\000", vtable: 0x14af60}

x86_64 behavior

(gdb) p *scheduler
$1 = rxinu::scheduling::&DoesScheduling {pointer: 0x147be8 <<rxinu::scheduling::SCHEDULER as core::ops::deref::Deref>::deref::__stability::LAZY+8> "\001\000", vtable: 0x147a10}

Workaround

The only workaround I've been able to find is removing the let scheduler = Box::from_raw(scheduler_ptr); line for i686 and replacing it with a hardcoded reference to SCHEDULER:

diff --git a/src/scheduling/process.rs b/src/scheduling/process.rs
index cf946da..e0df2fa 100644
--- a/src/scheduling/process.rs
+++ b/src/scheduling/process.rs
@@ -102,8 +102,14 @@ pub unsafe extern "C" fn process_ret() {
     let scheduler_ptr: *mut &DoesScheduling;
     asm!("pop $0" : "=r"(scheduler_ptr) : : "memory" : "intel", "volatile");

+    #[cfg(target_arch = "x86_64")]
     let scheduler = Box::from_raw(scheduler_ptr);

+    // TODO: there seems to be an issue with i686 and trait objects
+    // We need to investigate more
+    #[cfg(target_arch = "x86")]
+    let scheduler = &::scheduling::SCHEDULER;
+
     let curr_id: ProcessId = scheduler.getid();
     scheduler.kill(curr_id);
 }
robert-w-gries commented 6 years ago

i686 support has been dropped for now. Will re-open this later if it's still an issue.

robert-w-gries commented 6 years ago

Seeing this issue now but only when host system is Ubuntu 18.04.

Builds from Ubuntu 16 work as intended.

This issue applies to x86_64 as well.

phil-opp commented 6 years ago

Maybe this is a problem with fat pointers? See https://gitter.im/phil-opp/blog_os?at=5af41d69b84be71db9f72f46

robert-w-gries commented 6 years ago

Thanks for the hint Phil! Since my scheduler trait object experiment has some bugs, I reverted to using the global scheduler object in process return.

I added in the old behavior in a new branch, feature/sched_trait_object, but I cannot reproduce this issue and test your suggested fix. I'm certain I'll run into this bug again so I'll leave the issue open for now.

robert-w-gries commented 5 years ago

Closing since it's not a pertinent issue. Will re-open if I run into it again.