jkozlowski / starfish

Rust futures on spdk
33 stars 3 forks source link

Cargo build failure and questions about handling callback in SPDK #5

Closed xxks-kkk closed 5 years ago

xxks-kkk commented 6 years ago

Hello,

I'm trying to study your source code on how you build a rust-friendly wrapper on top of the rust bindings. I still cannot figure out how I can wrap around the callback function argument, which is heavily used in SPDK. I'm looking at your implementation of pub async fn write<'a> inside blob.rs, which is a wrapper around spdk_blob_io_write.

I don't understand the Line 286. What's cb_arg inside cb_arg::<()>(sender)? what does cb_arg::<()>(sender) mean? I'm trying to wrap around spdk_bdev_write and the callback function type in my case is pub type spdk_bdev_io_completion_cb = ::std::option::Option<unsafe extern "C" fn(bdev_io: *mut spdk_bdev_io, success: bool, cb_arg: *mut ::std::os::raw::c_void)>;. Can you share some tips on how you do the wrapper?

I try to clone your code and build it to see if cb_arg is from the generated bindings but when I run cargo build, I hit the following error

   Compiling clap v2.32.0
error[E0658]: scoped lint `clippy::cast_ptr_alignment` is experimental (see issue #44690)
   --> /home/zeyuanhu/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-preview-0.3.0-alpha.8/src/future/future_obj.rs:232:21
    |
232 |             #[allow(clippy::cast_ptr_alignment)]
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: add #![feature(tool_lints)] to the crate attributes to enable

However, I notice you use #![feature(tool_lints)] inside lib.rs of spdk-sys. I'm not sure why I still hit the error.

Thanks!

jkozlowski commented 6 years ago

There are a few questions here, so let me try to break it down:

what does cb_arg::<()>(sender) mean?

Have a look at the definition: it simply boxes (heap allocates) the Sender arg and then converts it into a raw pointer (*mut c_void) that can be passed to spdk.

Can you share some tips on how you do the wrapper?

Certainly. The way spdk functions work is that you give them a pointer to a function (needs to be a extern "C", so I have either complete_callback_0 or complete_callback_1, depending on whether I get some value back from spdk or just an error code) and then also a pointer to a "context" (which is a heap allocated piece of state that your callback needs - to spdk it's the *mut c_void bit).

Spdk takes those args and when it wants to call your callback, it simply calls the extern "C" function you provided, giving it the *mut c_void pointer you gave it previously (plus some extra gobbins, depending on what the particular function is doing).

The clever bit in here is that I made complete_callback_1 generic over the return type, which means I only define it once, but rust will monomorphise it for me for any type I need to use it with (so e.g. when I used it as an arg to spdk_bs_init and spdk_bs_create_blob, rust will actually resolve those to 2 different extern "C" functions that I can then pass to spdk).

I try to clone your code and build it to see if cb_arg is from the generated bindings but when I run cargo build, I hit the following error

Not quite sure what is happening there, but the build setup is setup on CI, so you can try to have a look what you missed here.

One thing that stands out is that you're using alpha.8 of futures, which I didn't have time to update to yet. Also the error is coming from compiling futures, not my code.

jkozlowski commented 6 years ago

Can I also ask what's your interest here? Are you thinking of building something on top this code, or your interest is purely in the mechanics of wrapping callback based C APIs?

xxks-kkk commented 6 years ago

Thanks for the detailed response. I'm working with SPDK using Rust as well and I try to wrap the bdev.h, which your repo doesn't cover. I'm amazed how you provide the rust-friendly wrapper and I try to learn more about it :-) I find wrapping around the callbacks are tricky to do.

Regarding the pointer to "context", I haven't seen you wrapper requires passing in any context struct for the callback. Maybe I'm wrong but can you point a place that you actually need to pass in some struct to provide the context for the callback? Doing Some(complete_callback_0) and then cb_arg::<()>(sender) can take up the only spot we can pass in the context struct (i.e., in SPDK, we can pass in the context through void* cb_arg but now this place is occupied by cb_arg::<()>(sender)). How can you pass in the context struct for the callback?

In some places, you use cb_arg(sender) while in some places you use cb_arg::<()>(sender)? Are they equivalent?

You use the pattern of let (sender, receiver) = oneshot::channel(); and invoke the bindings and then do await! macro. I have hard time understanding the benefit of writing the code like this. What's the benefit of this pattern over say directly calling the bindings and return the result?

Thank you in advance!

jkozlowski commented 6 years ago

sender is the context: basically this is written using rust futures: spdk is async, it will give you a callback when the operation completes. So when the callback happens, I simply complete the sender which schedules the future to run and continue its operation.

await! is part of the new async/await syntax that is making it easier to write async code. I suggest reading up a bit more on that.

So to summarise: I use the cb_arg to associate a oneshot (basically a promise if you prefer JavaScript terminology) with the call I am making. I pass in the sender as cb_arg so that when spdk calls me back in either complete_callback, it passes the sender which I can complete. Completing the oneshot schedules the underlying task (future) that holds the Receiver part of the oneshot. Once that runs I can basically collect the result and propagate it up.

This stuff is tricky and I’m not sure I am making a great job of explaining it, but keep plugging away at it.

I think you are getting confused here: ‘What's the benefit of this pattern over say directly calling the bindings and return the result?’

What you’re saying is not possible: after you call spdk_blob_read or any of the other async calls, it takes the arguments you pass to it, saves them somewhere, and returns immediately. The operation will not actually be completed until some later time: you must figure out a way to suspend what you were doing and move on to doing something else. That’s what futures allow you to do: suspend your computation and restore it once the spdk callback happens.

What is it you’re trying to build?

Sent from my iPhone

On 19 Oct 2018, at 22:39, Zeyuan Hu notifications@github.com wrote:

Thanks for the detailed response. I'm working with SPDK using Rust as well and I try to wrap the bdev.h, which your repo doesn't cover. I'm amazed how you provide the rust-friendly wrapper and I try to learn more about it :-) I find wrapping around the callbacks are tricky to do.

Regarding the pointer to "context", I haven't seen you wrapper requires passing in any context struct for the callback. Maybe I'm wrong but can you point a place that you actually need to pass in some struct to provide the context for the callback? Doing Some(complete_callback_0) and then cb_arg::<()>(sender) can take up the only spot we can pass in the context struct (i.e., in SPDK, we can pass in the context through void* cb_arg but now this place is occupied by cb_arg::<()>(sender)). How can you pass in the context struct for the callback?

In some places, you use cb_arg(sender) while in some places you use cb_arg::<()>(sender)? Are they equivalent?

You use the pattern of let (sender, receiver) = oneshot::channel(); and invoke the bindings and then do await! macro. I have hard time understanding the benefit of writing the code like this. What's the benefit of this pattern over say directly calling the bindings and return the result?

Thank you so much for the help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

xxks-kkk commented 6 years ago

Thanks for the explanation. I still not sure I fully understand about the sender as context part. Here is my situation. I'm rewriting hello_bdev.c using Rust. As you can see, I try to pass in context as the additional argument to the callback. However, by how I'm wrapping right now, I have to pass my context struct as the surrounding capture using closure. However, from your code (e.g. the wrapper for spdk_blob_io_write), I think you probably fact the same situation as mine? I'm not sure how you're able to do this (passing context struct) without using closure?

If possible, do you have an example showing how you use your wrapper?

As you can see, I'm trying to build a file system on top of SPDK using Rust. I'll definitely give credits to your kindness help once my wrapper comes into a good shape :-)

jkozlowski commented 6 years ago

Cool, your interface is different: you are preserving the callback based API-style that SPDK also uses, and that’s fine. So in your case it makes sense to take a closure that you will call once spdk call completes, box that and provide that as the context, so you can later unwrap it and call it with the result.

I wanted to raise the abstraction level and use futures: it’s a different approach, so they will not align. In my case I pass a sender as my context as that’s what allows me to complete the call later on.

Sure examples are in starfish-example-app, but also at the bottom of the blob.rs file I have the tests.

Sent from my iPhone

On 21 Oct 2018, at 06:42, Zeyuan Hu notifications@github.com wrote:

Thanks for the explanation. I still not sure I fully understand about the sender as context part. Here is my situation. I'm rewriting hello_bdev.c using Rust. As you can see, I try to pass in context as the additional argument to the callback. However, by how I'm wrapping right now, I have to pass my context struct as the surrounding capture using closure. However, from your code (e.g. the wrapper for spdk_blob_io_write), I think you probably fact the same situation as mine? I'm not sure how you're able to do this (passing context struct) without using closure?

If possible, do you have an example showing how you use your wrapper?

As you can see, I'm trying to build a file system on top of SPDK using Rust. I'll definitely give credits to your kindness help once my wrapper comes into a good shape :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jkozlowski commented 6 years ago

Had a quick look at your code: if you want it to work, you need to change the FnMut you provide to your spdk_bdev_write to instead have almost the same signature as spdk_bdev_io_completion_cb, except for the cb_arg bit, then in spdk_bdev_io_completion_cb you unwrap the cb_arg, cast it to your closure type and call the closure with the bdev_io and success values that spdk passes to you.

Then in your main.rs you will get those results and act on them and do further work in that callback. As you can imagine you will end up with a cascade of calls, which is what futures are trying to help with.

I am not sure I can help you more tbh, as your approach is different.

Sent from my iPhone

On 21 Oct 2018, at 08:11, Jakub Kozlowski mail@jakub-kozlowski.com wrote:

Cool, your interface is different: you are preserving the callback based API-style that SPDK also uses, and that’s fine. So in your case it makes sense to take a closure that you will call once spdk call completes, box that and provide that as the context, so you can later unwrap it and call it with the result.

I wanted to raise the abstraction level and use futures: it’s a different approach, so they will not align. In my case I pass a sender as my context as that’s what allows me to complete the call later on.

Sure examples are in starfish-example-app, but also at the bottom of the blob.rs file I have the tests.

Sent from my iPhone

On 21 Oct 2018, at 06:42, Zeyuan Hu notifications@github.com wrote:

Thanks for the explanation. I still not sure I fully understand about the sender as context part. Here is my situation. I'm rewriting hello_bdev.c using Rust. As you can see, I try to pass in context as the additional argument to the callback. However, by how I'm wrapping right now, I have to pass my context struct as the surrounding capture using closure. However, from your code (e.g. the wrapper for spdk_blob_io_write), I think you probably fact the same situation as mine? I'm not sure how you're able to do this (passing context struct) without using closure?

If possible, do you have an example showing how you use your wrapper?

As you can see, I'm trying to build a file system on top of SPDK using Rust. I'll definitely give credits to your kindness help once my wrapper comes into a good shape :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

xxks-kkk commented 6 years ago

Thanks for the comment. I haven't examples in main.rs in starfish-example-app and I haven't seen any tests under blob.rs.

You're right there will be a cascade of calls and I think my code will have the bug if there are multiple requests to the SPDK, how SPDK decides to finish the requests really depends on its internal scheduler as you pointed out earlier. I'm not a huge fan of SPDK chain of callback style and I think they're doing so due to the limitation of C itself. I think Rust can do a better job when comes to the provide the API and your approach is the solution to this. My current approach is inspired from your code here. However, I'm kind of wondering how my use will be different in main.rs if I use your approach instead (i.e., how write_complete and spdk_bdev_write will be glued together if I switch spdk_bdev_write to your approach).

Thank you for your time.

jkozlowski commented 5 years ago

I'm going to assume that you managed to get yourself unblocked. Feel free to reopen if not.

xxks-kkk commented 5 years ago

@jkozlowski I have made some small progress and now I hit the error when I try to call await! within some async function. I think the error I got is specific to the SPDK and not Rust. Here is the code. The error I got is:

118 |         tokio::run_async(run());
    |         ^^^^^^^^^^^^^^^^ `*mut libspdk_sys::spdk_bdev` cannot be sent between threads safely
    |
    = help: within `impl std::future::Future`, the trait `std::marker::Send` is not implemented for `*mut libspdk_sys::spdk_bdev`
    = note: required because it appears within the type `spdk_rs::SpdkBdev`
    = note: required because it appears within the type `std::option::Option<spdk_rs::SpdkBdev>`
    = note: required because it appears within the type `{std::option::Option<spdk_rs::SpdkBdev>, impl std::future::Future, ()}`
    = note: required because it appears within the type `[static generator@src/main.rs:63:43: 103:2 {std::option::Option<spdk_rs::SpdkBdev>, impl std::future::Future, ()}]`
    = note: required because it appears within the type `std::future::GenFuture<[static generator@src/main.rs:63:43: 103:2 {std::option::Option<spdk_rs::SpdkBdev>, impl std::future::Future, ()}]>`
    = note: required because it appears within the type `impl std::future::Future`
    = note: required because it appears within the type `impl std::future::Future`
    = note: required because it appears within the type `{impl std::future::Future, ()}`
    = note: required because it appears within the type `[static generator@src/main.rs:53:16: 61:2 {impl std::future::Future, ()}]`
    = note: required because it appears within the type `std::future::GenFuture<[static generator@src/main.rs:53:16: 61:2 {impl std::future::Future, ()}]>`
    = note: required because it appears within the type `impl std::future::Future`
    = note: required because it appears within the type `impl std::future::Future`
    = note: required by `tokio::run_async`

I'm new to Rust and I'm wondering if I need some special treatment when wrapping around spdk_bdev struct here? I'm wondering if you run into this issue before when you try to wrap around blob of spdk? What might be the cause for the error? I notice await! call works fine if I call it right at the beginning of run_inner() before any call relating to SPDK.

Also, I notice you implement your own executor. I'm wondering why you want to write your own instead of using something from Tokio?

Thanks much!

jkozlowski commented 5 years ago

I suggest reading the Rust book chapters in Send types, it probably will do a better job of explaining what’s going on.

But logically, ‘*’ is a pointer type, and I’m sure you appreciate that you should not send raw pointers across threads. So your issue occurs because Tokio requires its futures to be ‘Send’, because Tokio reactors are threadsafe by default and they want to be able to poll futures from any thread they wish. You hit the nail in the head: that is exactly why I implement my own reactor, to remove that restriction.

I believe newer version of Tokio should have a LocalExecutor type, just like I do. They only difference is that last time I checked they were still implemented with the use of Arc and other thread safe primitives which I conjecture to be unnecessary and hypothesise they’re bad for performance. An unstated goal of my work on this is to evaluate those hypotheses.

Hope that helps

Sent from my iPhone

On 14 Dec 2018, at 21:57, Zeyuan Hu notifications@github.com wrote:

@jkozlowski I have made some small progress and now I hit the error when I try to call await! within some async function. I think the error I got is specific to the SPDK and not Rust. Here is the code. The error I got is:

118 tokio::run_async(run()); ^^^^^^^^^^^^^^^^ *mut libspdk_sys::spdk_bdev cannot be sent between threads safely
= help: within `impl std::future::Future`, the trait `std::marker::Send` is not implemented for `*mut libspdk_sys::spdk_bdev`
= note: required because it appears within the type `spdk_rs::SpdkBdev`
= note: required because it appears within the type `std::option::Option<spdk_rs::SpdkBdev>`
= note: required because it appears within the type `{std::option::Option<spdk_rs::SpdkBdev>, impl std::future::Future, ()}`
= note: required because it appears within the type `[static generator@src/main.rs:63:43: 103:2 {std::option::Option<spdk_rs::SpdkBdev>, impl std::future::Future, ()}]`
= note: required because it appears within the type `std::future::GenFuture<[static generator@src/main.rs:63:43: 103:2 {std::option::Option<spdk_rs::SpdkBdev>, impl std::future::Future, ()}]>`
= note: required because it appears within the type `impl std::future::Future`
= note: required because it appears within the type `impl std::future::Future`
= note: required because it appears within the type `{impl std::future::Future, ()}`
= note: required because it appears within the type `[static generator@src/main.rs:53:16: 61:2 {impl std::future::Future, ()}]`
= note: required because it appears within the type `std::future::GenFuture<[static generator@src/main.rs:53:16: 61:2 {impl std::future::Future, ()}]>`
= note: required because it appears within the type `impl std::future::Future`
= note: required because it appears within the type `impl std::future::Future`
= note: required by `tokio::run_async`

I'm new to Rust and I'm wondering if I need some special treatment when wrapping around spdk_bdev struct here? What might be the cause for the error? I notice await! call works fine if I call it right at the beginning of run_inner() before any call relating to SPDK.

Also, I notice you implement your own executor. I'm wondering why you want to write your own instead of using something from Tokio?

Thanks much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

xxks-kkk commented 5 years ago

@jkozlowski Thanks for the insights. I play around with your executor implementation and use it to implement hello_bdev. I run into the following issue:

  1. drop(poller) leads to seg fault code
ff697a41e in spdk_app_start (opts=0x7fffffffe150, start_fn=0x555555577670 <spdk_rs::event::SpdkAppOpts::start::start_wrapper::h27f22a723f05307c>, arg1=0x7fffffffdea8, arg2=0x0) at app.c:576
#3  0x000055555557772e in spdk_rs::event::SpdkAppOpts::start::h42df88f6a6f8e026 (self=..., f=...) at /home/zeyuanhu/share/rustfs/spdk-rs/src/event.rs:66
#4  0x000055555557aa6e in hello_nvme_bdev_rust_wrapper::main::he8692bb46e87c6f0 () at src/main.rs:107
#5  0x0000555555579f70 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h6afc4ff1576ed53f () at /rustc/21f26849506c141a6760532ca5bdfd8345247fdb/src/libstd/rt.rs:74
#6  0x00005555555c68f3 in std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h89c6e21182e0edb2 () at src/libstd/rt.rs:59
#7  std::panicking::try::do_call::h37211ff12254dd07 () at src/libstd/panicking.rs:310
#8  0x00005555555d465a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:102
#9  0x00005555555c72c4 in std::panicking::try::hff26ecf2b679d32f () at src/libstd/panicking.rs:289
#10 std::panic::catch_unwind::h51f6113138b46c27 () at src/libstd/panic.rs:398
#11 std::rt::lang_start_internal::haf6df4f58cf174a0 () at src/libstd/rt.rs:58
#12 0x0000555555579f49 in std::rt::lang_start::hb3f6cdb8ed04a4e0 (main=0x55555557a750 <hello_nvme_bdev_rust_wrapper::main::he8692bb46e87c6f0>, argc=1, argv=0x7fffffffe4e8) at /rustc/21f26849506c141a6760532ca5bdfd8345247fdb/src/libstd/rt.rs:74
#13 0x000055555557ab7a in main ()

I check the drop implementation and the call to spdk_poller_unregister makes sense to me but you leave some comment say crash might happen. I'm wondering if you can explain more about your concern there?

  1. I cannot terminate SPDK framework by calling spdk_app_stop (code). If I follow your hello_blob example by calling spdk_app_stop once, reactor still keeps running. I figure that I have to drop(poller), which links to the previous issue. Now, I call spdk_app_stop twice to terminate SPDK framework. I'm wondering if my conjecture is correct (e.g., have to drop poller first before spdk_app_stop to shutdown SPDK)?

Thanks much!

jkozlowski commented 5 years ago

Yeah, I haven’t looked at it in a while, but I think I had some code somewhere that would close everything correctly. The tests exit fine I think, so they must be correct.

Spdk does reference counting of anything it returns (spdk_bdev etc.), so you need to make sure to drop and unregister all resources, otherwise it will not exit.

Jakub

Sent from my iPhone

On 15 Dec 2018, at 08:39, Zeyuan Hu notifications@github.com wrote:

@jkozlowski Thanks for the insights. I play around with your executor implementation and use it to implement hello_bdev. I run into the following issue:

drop(poller) leads to seg fault code ff697a41e in spdk_app_start (opts=0x7fffffffe150, start_fn=0x555555577670 , arg1=0x7fffffffdea8, arg2=0x0) at app.c:576

3 0x000055555557772e in spdk_rs::event::SpdkAppOpts::start::h42df88f6a6f8e026 (self=..., f=...) at /home/zeyuanhu/share/rustfs/spdk-rs/src/event.rs:66

4 0x000055555557aa6e in hello_nvme_bdev_rust_wrapper::main::he8692bb46e87c6f0 () at src/main.rs:107

5 0x0000555555579f70 in std::rt::langstart::$u7b$$u7b$closure$u7d$$u7d$::h6afc4ff1576ed53f () at /rustc/21f26849506c141a6760532ca5bdfd8345247fdb/src/libstd/rt.rs:74

6 0x00005555555c68f3 in std::rt::lang_startinternal::$u7b$$u7b$closure$u7d$$u7d$::h89c6e21182e0edb2 () at src/libstd/rt.rs:59

7 std::panicking::try::do_call::h37211ff12254dd07 () at src/libstd/panicking.rs:310

8 0x00005555555d465a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:102

9 0x00005555555c72c4 in std::panicking::try::hff26ecf2b679d32f () at src/libstd/panicking.rs:289

10 std::panic::catch_unwind::h51f6113138b46c27 () at src/libstd/panic.rs:398

11 std::rt::lang_start_internal::haf6df4f58cf174a0 () at src/libstd/rt.rs:58

12 0x0000555555579f49 in std::rt::lang_start::hb3f6cdb8ed04a4e0 (main=0x55555557a750 , argc=1, argv=0x7fffffffe4e8) at /rustc/21f26849506c141a6760532ca5bdfd8345247fdb/src/libstd/rt.rs:74

13 0x000055555557ab7a in main ()

I check the drop implementation and the call to spdk_poller_unregister makes sense to me but you leave some comment say crash might happen. I'm wondering if you can explain more your concern there?

I cannot terminate SPDK framework by calling spdk_app_stop (code). If I follow your hello_blob example, reactor still keeps running. I figure that I have to drop(poller), which links to the previous issue. Now, I call spdk_app_stop twice to terminate SPDK framework. I'm wondering if my conjecture is correct (e.g., have to drop poller first before spdk_app_stop to shutdown SPDK)? Thanks much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.