jkozlowski / starfish

Rust futures on spdk
33 stars 3 forks source link

Upgrade rust and SPDK to newer versions #8

Closed jkryl closed 5 years ago

jkryl commented 5 years ago

The primary motivation for doing that was to work around problems encountered with previous versions. However the fix is dirty. I haven't found a good solution for problem of missing rte_mempool_ring shared library so tests need to be run like this:

LD_PRELOAD=/usr/local/lib/librte_mempool_ring.so.1.1 cargo test --all

For more detailed background information see discussion in issue #6 .

NOTE: I don't have idea if circleci tests will pass. It's quite possible that I will have to fix a few more things before we can merge these changes.

jkozlowski commented 5 years ago

@jkryl I forgot to enable the CI checks on branches, I think github changed something since I last used PRs, since my previous PRs had to pass CI.

Anyway, let's see if it builds.

jkozlowski commented 5 years ago

Hmm, still not there...

jkozlowski commented 5 years ago

Ok, should be working now.

jkozlowski commented 5 years ago

Hmm, I am a bit of a dummy at these linking issues, so not sure. Would it be worth to just ping an issue on spdk to get their help? I don't understand what they've changed that just linking to libspdk doesn't work.

jkozlowski commented 5 years ago

I went ahead and created an issue, doesn't hurt to ask: https://github.com/spdk/spdk/issues/518. Feel free to add some more details there, maybe they can improve the way the library is built to make linking easier.

I know I am spoiled coming from Java, but it just feels a bit weird when we need to know so much internal detail about this library in order to get it linked to another piece of code.

jkryl commented 5 years ago

thanks for creating the ticket for spdk! a guidance from spdk folks would be definitely appreciated. I fixed a few things but the circleci build is still failing and I can't reproduce the failure locally so it is a bit difficult for me to find out what could be wrong. I will try to fix the remaining issue(s) but cannot promise that it will happen any time soon :-)

jkozlowski commented 5 years ago

No worries, your efforts are certainly appreciated. To bait you a bit: you should actually be able to get a CircleCI build with SSH enabled, meaning you can debut it in that machine :D its a pretty brilliant way to debug such failures, when all else fails.

But yeah, let’s hope they have some insights!

Sent from my iPhone

On 26 Nov 2018, at 21:19, Jan Kryl notifications@github.com wrote:

thanks for creating the ticket for spdk! a guidance from spdk folks would be definitely appreciated. I fixed a few things but the circleci build is still failing and I can't reproduce the failure locally so it is a bit difficult for me to find out what could be wrong. I will try to fix the remaining issue(s) but cannot promise that it will happen any time soon :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jkryl commented 5 years ago

@jkozlowski I found the way how to build the program with static spdk/dpdk libraries and without LD_PRELOAD hack. It is not ideal fix but I think it's acceptable until rustc is made more flexible in regard to --whole-archive linker option. The solution is to reference symbol from librte_mempool_ring library which forces the linker to include it in the binary. I don't have time to prepare the PR now, but just a heads-up that eventually I will update this PR with a new fix..

jkozlowski commented 5 years ago

Ha, I thought of the same thing, but then didn't have time to pursue. Can you at least explain what you did? Did you generate the bindings with bindgen and then call some innocent looking method?

I might have some time this weekend to hack a bit, so if you won't get to it by now, at least I'd have some skeleton of an approach.

jkryl commented 5 years ago

Sure, so the solution consists of following pieces of code.

All dpdk/spdk static libraries need to be named explicitly In build.rs. Following function gets a list of them:

fn static_lib_list() -> Vec<&'static str> {
    let mut dpdk_static_libs = vec![
        "rte_bus_pci",
        "rte_bus_vdev",
        "rte_eal",
        "rte_ethdev",
        "rte_kvargs",
        "rte_mbuf",
        "rte_mempool",
        "rte_mempool_bucket",
        "rte_mempool_ring",
        "rte_net",
        "rte_pci",
        "rte_ring",
    ];

    let mut spdk_static_libs = vec![
        "spdk_app_rpc",
        "spdk_event",
        "spdk_event_bdev",
        "spdk_event_copy",
        "spdk_event_iscsi",
        "spdk_event_nbd",
        "spdk_event_net",
        "spdk_event_nvmf",
        "spdk_bdev",
        "spdk_bdev_aio",
        "spdk_bdev_malloc",
        "spdk_bdev_null",
        "spdk_bdev_nvme",
        "spdk_bdev_rpc",
        "spdk_bdev_virtio",
        "spdk_blob",
        "spdk_blob_bdev",
        "spdk_blobfs",
        "spdk_conf",
        "spdk_copy",
        "spdk_copy_ioat",
        "spdk_env_dpdk",
        "spdk_event_scsi",
        "spdk_event_vhost",
        "spdk_ioat",
        "spdk_iscsi",
        "spdk_json",
        "spdk_jsonrpc",
        "spdk_log",
        "spdk_log_rpc",
        "spdk_lvol",
        "spdk_nbd",
        "spdk_net",
        "spdk_nvme",
        "spdk_nvmf",
        "spdk_rpc",
        "spdk_rte_vhost",
        "spdk_scsi",
        "spdk_sock",
        "spdk_sock_posix",
        "spdk_spdk_mock",
        "spdk_thread",
        "spdk_trace",
        "spdk_util",
        "spdk_vbdev_raid",
        "spdk_vbdev_error",
        "spdk_vbdev_gpt",
        "spdk_vbdev_lvol",
        "spdk_vbdev_split",
        "spdk_vbdev_passthru",
        "spdk_vhost",
        "spdk_virtio",
        "spdk_env_dpdk",
    ];

    dpdk_static_libs.append(&mut spdk_static_libs);
    dpdk_static_libs
}

... somewhere in main():
    for lib in static_lib_list() {
        println!("cargo:rustc-link-lib=static={}", lib);
    }

the code which references a symbol from the library which needs to be loaded is:

// XXX This is a hack to require librte_mempool_ring library which is otherwise
// excluded by rustc because no symbols are used from it. However it contains
// important constructor without which all mempool allocations fail.
extern "C" {
    fn mp_hdlr_init_ops_mp_mc();
}
pub fn force_librte_mempool_ring_load() {
    unsafe {
        // this call does not make any harm because if it is run second time
        // it returns EEXIST
        mp_hdlr_init_ops_mp_mc();
    }
}

The best place to put this is probably in lib.rs in spdk-sys where the generated bindings are included so that it does not have to be repeated for each binary that is linked with spdk-sys. And that should be it, looks simple now :-)

btw when using static libraries, the spdk/dpdk does not need to be installed to /usr/local anymore. It could be included in the repo in form of a git submodule and static libraries and headers can be used from that location. It's just an option to consider, not necessarily better than what we do now.

I dared to open a rust issue so that we don't need to use this workaround in long term: https://github.com/rust-lang/rust/issues/56306 .

jkozlowski commented 5 years ago

Lovely, thanks for that list of steps, let’s see how gets there first. I’ll echo here if I start working on it.

The static list of libs is annoying, maybe I’ll find a way to read that list somehow.

Good stuff, really happy with the collaboration here and appreciate your efforts!

Sent from my iPhone

On 28 Nov 2018, at 08:17, Jan Kryl notifications@github.com wrote:

Sure, so the solution consists of following pieces of code.

All dpdk/spdk static libraries need to be named explicitly In build.rs. Following function gets a list of them:

fn static_lib_list() -> Vec<&'static str> { let mut dpdk_static_libs = vec![ "rte_bus_pci", "rte_bus_vdev", "rte_eal", "rte_ethdev", "rte_kvargs", "rte_mbuf", "rte_mempool", "rte_mempool_bucket", "rte_mempool_ring", "rte_net", "rte_pci", "rte_ring", ];

let mut spdk_static_libs = vec![
    "spdk_app_rpc",
    "spdk_event",
    "spdk_event_bdev",
    "spdk_event_copy",
    "spdk_event_iscsi",
    "spdk_event_nbd",
    "spdk_event_net",
    "spdk_event_nvmf",
    "spdk_bdev",
    "spdk_bdev_aio",
    "spdk_bdev_malloc",
    "spdk_bdev_null",
    "spdk_bdev_nvme",
    "spdk_bdev_rpc",
    "spdk_bdev_virtio",
    "spdk_blob",
    "spdk_blob_bdev",
    "spdk_blobfs",
    "spdk_conf",
    "spdk_copy",
    "spdk_copy_ioat",
    "spdk_env_dpdk",
    "spdk_event_scsi",
    "spdk_event_vhost",
    "spdk_ioat",
    "spdk_iscsi",
    "spdk_json",
    "spdk_jsonrpc",
    "spdk_log",
    "spdk_log_rpc",
    "spdk_lvol",
    "spdk_nbd",
    "spdk_net",
    "spdk_nvme",
    "spdk_nvmf",
    "spdk_rpc",
    "spdk_rte_vhost",
    "spdk_scsi",
    "spdk_sock",
    "spdk_sock_posix",
    "spdk_spdk_mock",
    "spdk_thread",
    "spdk_trace",
    "spdk_util",
    "spdk_vbdev_raid",
    "spdk_vbdev_error",
    "spdk_vbdev_gpt",
    "spdk_vbdev_lvol",
    "spdk_vbdev_split",
    "spdk_vbdev_passthru",
    "spdk_vhost",
    "spdk_virtio",
    "spdk_env_dpdk",
];

dpdk_static_libs.append(&mut spdk_static_libs);
dpdk_static_libs

}

... somewhere in main(): for lib in static_lib_list() { println!("cargo:rustc-link-lib=static={}", lib); } the code which references a symbol from the library which needs to be loaded is:

// XXX This is a hack to require librte_mempool_ring library which is otherwise // excluded by rustc because no symbols are used from it. However it contains // important constructor without which all mempool allocations fail. extern "C" { fn mp_hdlr_init_ops_mp_mc(); } pub fn force_librte_mempool_ring_load() { unsafe { // this call does not make any harm because if it is run second time // it returns EEXIST mp_hdlr_init_ops_mp_mc(); } } The best place to put this is probably in lib.rs in spdk-sys where the generated bindings are included so that it does not have to be repeated for each binary that is linked with spdk-sys. And that should be it, looks simple now :-)

btw when using static libraries, the spdk/dpdk does not need to be installed to /usr/local anymore. It could be included in the repo in form of a git submodule and static libraries and headers can be used from that location. It's just an option to consider, not necessarily better than what we do now.

I dared to open a rust issue so that we don't need to use this workaround in long term: rust-lang/rust#56306 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jkryl commented 5 years ago

I just discovered that rough idea of the fix I posted above is not complete. All SPDK subsystem libs suffer from the same problem. Namely these libs:

libspdk_event_bdev.a  libspdk_event_iscsi.a  libspdk_event_net.a   libspdk_event_scsi.a
libspdk_event_copy.a  libspdk_event_nbd.a    libspdk_event_nvmf.a  libspdk_event_vhost.a

Purpose of these libs is to load and initialize a subsystem in spdk. The subsystem implementation itself is delivered on behalf of a different lib. There are very few symbols that these libraries define and none of them is global (public). So the solution we used for pulling in mempool_ring lib won't work for these. After some time thinking about the problem I came to conclusion that the only solution is to patch SPDK source code. But before I dive into details a quick summary of possible ways follows:

1) Don't upgrade and continue to use spdk v18.06. However that means getting stuck at that version for a very long time. Either SPDK would have to change (unlikely as this is not spdk bug) or rustc linker would have to be fixed (could take a long time). 2) Use shared libs. In general less preferable than static libs because of the nature of shared libraries and it requires LD_PRELOAD hack to load all .so libs which can't be loaded automatically (in our case 8 libs so far). 3) Use static libs, but spdk needs to be patched in that case. More information on this method follows.

When patching spdk we could make use of the macro for registering spdk subsystem SPDK_SUBSYSTEM_REGISTER(_name). Simple sed script (or diff patch) can change the first line of the macro from:

#define SPDK_SUBSYSTEM_REGISTER(_name) \

to:

#define SPDK_SUBSYSTEM_REGISTER(_name) char touch_of_the_rust_ ## _name; \

and then we leverage these newly introduced global variables in bindings lib to create dependency on all spdk subsystems. This file could be autogenerated in build.rs and included in lib.rs from spdk-sys:

extern "C" {
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_net_framework: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_copy: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_scsi: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_vhost: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_nbd: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_bdev: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_iscsi: i8;
  #[no_mangle]
  static touch_of_the_rust_g_spdk_subsystem_nvmf: i8;
}
pub unsafe fn touch_subsystems() -> i8 {
  let mut sum: i8 = 0;
  sum += touch_of_the_rust_g_spdk_subsystem_net_framework;
  sum += touch_of_the_rust_g_spdk_subsystem_copy;
  sum += touch_of_the_rust_g_spdk_subsystem_scsi;
  sum += touch_of_the_rust_g_spdk_subsystem_vhost;
  sum += touch_of_the_rust_g_spdk_subsystem_nbd;
  sum += touch_of_the_rust_g_spdk_subsystem_bdev;
  sum += touch_of_the_rust_g_spdk_subsystem_iscsi;
  sum += touch_of_the_rust_g_spdk_subsystem_nvmf;
  sum
}

I verified on another project using spdk that this solution works but A) it is turning into quite an ungly hack. B) patching spdk is not aligned with starfish way of doing things. starfish relies on spdk libs being installed in /usr/local. While we could patch spdk in CI script for building SPDK, it could be a nasty surprise for anyone trying to build starfish with his own SPDK.

jkozlowski commented 5 years ago

Hey,

Thanks for the summary. I get this was coming :( what would happen if we just changed some linker defaults?

Anyway, I should have some time tomorrow morning to hack on this a bit. All this does indeed feel to me like a massive pain, maybe we can convince spdk to continue to publish the same sort of library they did in the version we’re upgrading from?

How would this all change with the effort they’re engaging in to package the lib with system package managers? I know Rust has some support for things like pkgconfig, but how good is it?

Is really the only issue that the rust compiler can’t honour flags like -Whole-archive (I am on my phone so don’t remember the names) and —no-unused?

Sent from my iPhone

On 30 Nov 2018, at 08:26, Jan Kryl notifications@github.com wrote:

I just discovered that rough idea of the fix I posted above is not complete. All SPDK subsystem libs suffer from the same problem. Namely these libs:

libspdk_event_bdev.a libspdk_event_iscsi.a libspdk_event_net.a libspdk_event_scsi.a libspdk_event_copy.a libspdk_event_nbd.a libspdk_event_nvmf.a libspdk_event_vhost.a Purpose of these libs is to load and initialize a subsystem in spdk. The subsystem implementation itself is delivered on behalf of a different lib. There are very few symbols that these libraries define and none of them is global (public). So the solution we used for pulling in mempool_ring lib won't work for these. After some time thinking about the problem I came to conclusion that the only solution is to patch SPDK source code. But before I dive into details a quick summary of possible ways follows:

Don't upgrade and continue to use spdk v18.06. However that means getting stuck at that version for a very long time. Either SPDK would have to change (unlikely as this is not spdk bug) or rustc linker would have to be fixed (could take a long time). Use shared libs. In general less preferable than static libs because of the nature of shared libraries and it requires LD_PRELOAD hack to load all .so libs which can't be loaded automatically (in our case 8 libs so far). Use static libs, but spdk needs to be patched in that case. More information on this method follows. When patching spdk we could make use of the macro for registering spdk subsystem SPDK_SUBSYSTEM_REGISTER(_name). Simple sed script (or diff patch) can change the first line of the macro from:

define SPDK_SUBSYSTEM_REGISTER(_name) \

to:

define SPDK_SUBSYSTEM_REGISTER(_name) char touch_of_therust ## _name; \

and then we leverage these newly introduced global variables in bindings lib to create dependency on all spdk subsystems. This file could be autogenerated in build.rs and included in lib.rs from spdk-sys:

extern "C" {

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_net_framework: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_copy: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_scsi: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_vhost: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_nbd: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_bdev: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_iscsi: i8;

[no_mangle]

static touch_of_the_rust_g_spdk_subsystem_nvmf: i8; } pub unsafe fn touch_subsystems() -> i8 { let mut sum: i8 = 0; sum += touch_of_the_rust_g_spdk_subsystem_net_framework; sum += touch_of_the_rust_g_spdk_subsystem_copy; sum += touch_of_the_rust_g_spdk_subsystem_scsi; sum += touch_of_the_rust_g_spdk_subsystem_vhost; sum += touch_of_the_rust_g_spdk_subsystem_nbd; sum += touch_of_the_rust_g_spdk_subsystem_bdev; sum += touch_of_the_rust_g_spdk_subsystem_iscsi; sum += touch_of_the_rust_g_spdk_subsystem_nvmf; sum } I verified on another project using spdk that this solution works but A) it is turning into quite an ungly hack. B) patching spdk is not aligned with starfish way of doing things. starfish relies on spdk libs being installed in /usr/local. While we could patch spdk in CI script for building SPDK, it could be a nasty surprise for anyone trying to build starfish with his own SPDK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jkryl commented 5 years ago

Hi @jkozlowski , just a couple of thoughts:

1) I don't think we can convince them to go back to a single libspdk.so library. This is against their philosophy of modularizing dpdk dependency. We could probably convince them to provide additional monolithic libspdk.so (with a different name) in addition to current set of dynamic libs. However that means we are stuck with shared libraries (instead of static) which is not optimal. 2) We could also try to convince them to export some of the symbols globally and continue using the hack with pulling static libraries in using those symbols. For spdk that would be a minor change. We would still need the hack, but no patching of spdk source code. 3) As for what's possible to accomplish with rustc linker I'm completely clueless as I don't understand it much. Though if we can "fix" the linker somehow, that would be obviously the best solution. I tried to change linker defaults using .cargo/config and it did not work for me. As I understand it the whole-archive linker option is really the only problem. 4) I'm not aware of the effort to provide pkg-config tool for rust, but IMHO it is not going to help us, unless it could somehow influence whole-archive flag which I doubt about.

jkozlowski commented 5 years ago

Hey,

Sorry for not replying earlier, this week is a bit crazy. I will try to get some time over the weekend to tinker.

I had it working with .cargo/config before, it’s just a massive pain: you need to put it with every project, the flags don’t get automatically transferred it you add a dependency on spdk-sys :(

And yes I used whole-archive for this. I think we should try to convince spdk to have an option for a simple linking option: it’s good to provide for easy iteration and development, and it kind of lowers barrier to entry.

Sent from my iPhone

On 30 Nov 2018, at 21:59, Jan Kryl notifications@github.com wrote:

Hi @jkozlowski , just a couple of thoughts:

I don't think we can convince them to go back to a single libspdk.so library. This is against their philosophy of modularizing dpdk dependency. We could probably convince them to provide additional monolithic libspdk.so (with a different name) in addition to current set of dynamic libs. However that means we are stuck with shared libraries (instead of static) which is not optimal. We could also try to convince them to export some of the symbols globally and continue using the hack with pulling static libraries in using those symbols. For spdk that would be a minor change. We would still need the hack, but no patching of spdk source code. As for what's possible to accomplish with rustc linker I'm completely clueless as I don't understand it much. Though if we can "fix" the linker somehow, that would be obviously the best solution. I tried to change linker defaults using .cargo/config and it did not work for me. As I understand it the whole-archive linker option is really the only problem. I'm not aware of the effort to provide pkg-config tool for rust, but IMHO it is not going to help us, unless it could somehow influence whole-archive flag which I doubt about. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jkryl commented 5 years ago

Closing this PR as it is outdated and the new spdk-sys provides a solution for this problem.