oll3 / nbd-async

Block device driver in userspace!
https://crates.io/crates/nbd-async
MIT License
8 stars 7 forks source link

Graceful shutdown of block device #12

Open muhamadazmy opened 11 months ago

muhamadazmy commented 11 months ago

Dear @oll3

I really like your nbd-async crate. The only feature i really missed is the option to do a graceful shutdown of the device so i can do any clean up tasks that i need to do before the app is terminated.

I also needed to schedule some periodic clean up jobs, the only way possible for me to use locks with mutex to be able to do clean up jobs internally, which I didn't like using locks because it means that normal read/write operations will always have to acquire the lock to operate.

So what I did was to introduce a control stream that i can send control messages over (this can be a tokio receiver), and it's up to the device to handle, those messages as needed.

I tried to be backward compatible as possible, you can see there is a minimal change to the example.

The deviec can then handle the control message (notify for janitor jobs) and shutdown (so it can gracefully shutdown)

Thanks

oll3 commented 11 months ago

Hi @muhamadazmy and thank you for your kind words.

Since I haven't really used this crate for quite some time I think I will need some time to go through your PR. It sound reasonable, but not sure I understand what it does fully. Would it be possible to add an additional example of how to utilize this shutdown?

muhamadazmy commented 11 months ago

Hi @oll3 I will update the PR with an example code on how to use it.

But for now if you wanna take a quick look at how i use it you can check it here

https://github.com/threefoldtech/qbd/blob/3b60e8806652fcb5f9dc1c6ee6e6294f02234b04/src/main.rs#L133

I basically create an mpsc channel and pass the receiver side as a stream to the serve_local_nbd function.

It means in another routine i can wait on signals (say hangup) and then push a Shutdown signal over the channel which will terminate the loop, and the function will return.

On return i can then do any clean up i need to do before exit.

I also can send other types of signals (Notify) that i can handle differently by the device implementation

oll3 commented 11 months ago

Hi @oll3 I will update the PR with an example code on how to use it.

But for now if you wanna take a quick look at how i use it you can check it here

https://github.com/threefoldtech/qbd/blob/3b60e8806652fcb5f9dc1c6ee6e6294f02234b04/src/main.rs#L133 ...

Ok, I think I understand it now, but not sure how I feel about it... Could something similar be achieved by braking serve_nbd it to smaller parts where the block device is only borrowed while processing a stream event instead? So that you could select on it externally and check if whatever shutdown condition you have is fulfilled.

muhamadazmy commented 11 months ago

It's not only about the shutdown, the PR title might be misleading. I can also send other control messages to the device that can then be received and handled by the BlockDevice implementation.

I am sorry but I didn't understand what you suggest to solve this. Breaking down serve_nbd to borrow the device instead of owning it will also mean that I can't access the device (since it's exclusively borrowed) to be able to call or modify it by any mean externally without introducing a lock (Mutex) which what i was trying to avoid from the start.

Also it might mean that the user of the crate need to know the internals of how it works so he/she can pull all the pieces together, no ? (I might have misunderstood you)

Hence I thought the best way to do this is to have the signal incorporated in the server_nbd loop, when the control message is received (either shutdown or notify) the device control method will be called first, then if it was a shutdown the loop is terminated which will make the serve_nbd return.

Then whatever is needed to be done during the life time of the device, or on shutdown can be implemented by the device. Also support to send "control" jobs to the device over the same channel to do other stuff that is totally custom to the implementation.

oll3 commented 11 months ago

It's not only about the shutdown, the PR title might be misleading. I can also send other control messages to the device that can then be received and handled by the BlockDevice implementation.

Yes, think I understand. Though I feel like that it's adding complexity that may be better to keep outside of the crate, if possible.

I was thinking something like below how the current example (mem) might look if it used this imaginary api. Note that I think that the serve_nbd should be left as it is, so that in the normal case it's still easy to get up and running with the current api.

This would make the RequestStream public and add some methods to it:

async fn main() {
    let nbd_path = std::env::args().nth(1).expect("NDB device path");
    let mut dev = MemDev::new(512, 128);

    let (sock, kernel_sock) = UnixStream::pair().unwrap();
    let _server = attach_device(
        nbd_path,
        kernel_sock,
        dev.block_size,
        dev.num_blocks as u64,
        false,
    )
    .await
    .unwrap();

    let (shutdown_tx, mut shutdown_rx) = tokio::sync::mpsc::channel::<()>(1);
    let mut stream = RequestStream::new(sock);
    loop {
        let recv = shutdown_rx.recv();
        let serve = async {
            while let Some(result) = stream.next().await {
                stream
                    .process_request(&mut dev, result.unwrap())
                    .await
                    .unwrap();
            }
        };

        select! {
            _ = serve => {}
            _shutdown = recv => {
                // dev can be (mutally) accessed here since only borrowed in the 'serve' async block.
            }
        }
    }
}

Is this doable or do you think the control message approach is better?