rcore-os / virtio-drivers

VirtIO guest drivers in Rust.
MIT License
209 stars 63 forks source link

The examples/riscv has a stuck bug #125

Closed elliott10 closed 6 months ago

elliott10 commented 6 months ago

Running examples/riscv will get stuck.

Like below:

屏幕截图 2024-04-12 161449

fslongjin commented 6 months ago

Me too. my qemu version is

QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.17)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
fslongjin commented 6 months ago

This issue is a bit magical; after adding #[inline(never)] to the can_pop function in VirtQueue under src/queue.rs, the test case can running normally. Or here:

while !self.can_pop() {
            info!("add_notify_wait_pop: can_pop=false");
            spin_loop();
        }

By adding an output log statement, it runs normally. I suspect it might be due to a lack of volatile read?

I modified codes in VirtQueue and other structs, make these variables volatile and the bug solved. I'll PR later.

qwandor commented 6 months ago

I don't think volatile access should be necessary for the used ring (e.g. in VirtQueue::can_pop), because it is shared memory not MMIO. My guess would be that we're missing a read barrier somewhere, which is allowing the compiler (or possibly the CPU?) to reorder memory access in a way that gives incorrect values.

fslongjin commented 6 months ago

I don't think volatile access should be necessary for the used ring (e.g. in VirtQueue::can_pop), because it is shared memory not MMIO. My guess would be that we're missing a read barrier somewhere, which is allowing the compiler (or possibly the CPU?) to reorder memory access in a way that gives incorrect values.

Maybe you are right.

Despite the test case of riscv in virtio-drivers can run properly, I have another test case which can not run properly yet.

My test case is in s-mode, sv39, rv64. In my test case the virtio-blk-mmio(legacy) hungs, and virtio-blk-mmio(modern) reports 'not ready' when it read blocks.

But the virtio net driver runs well.(both legacy and modern)

I'll try to investigate what happens.

qwandor commented 6 months ago

After talking to some other people about this, I think the issue is that we should be using atomics for some of these shared memory accesses. I'll send a PR shortly for you to test.

qwandor commented 6 months ago

@fslongjin Can you test if #128 fixes your issues reliably?

fslongjin commented 6 months ago

@fslongjin Can you test if #128 fixes your issues reliably?

sure, I'll test it tomorrow morning~