open-iscsi / tcmu-runner

A daemon that handles the userspace side of the LIO TCM-User backstore.
Apache License 2.0
189 stars 148 forks source link

move dbus into libtcmu #516

Open baihuahua opened 5 years ago

baihuahua commented 5 years ago

When tcmu-runner is running with other tcmu daemon like qemu-tcmu at the same time on a host with genl version setting 2 in kernel, creating target through qemu-tcmu will fail with message in tcmu-runner's log file:

2018-12-12 13:48:27.039 1542 [DEBUG] handle_netlink:195: cmd 1. Got header version 2. Supported 2. 2018-12-12 13:48:27.039 1542 [ERROR] add_device:477: could not find handler for uio0

This's because tcmu-runner and qemu-tcmu both receive the device add event from netlink, while tcmu-runner can't find a handler to handle the device so it fails the creation by returning the kernel error code.

To settle this problem, tcmu-runner needs a way to give other tcmu daemon a chance to handle the device creation before failing it. Or we can lower down dbus from tcmu-runner into libtcmu so all daemons can acquire dbus on their own without dependence.

mikechristie commented 5 years ago

It's not clear to me how dbus comes into the problem or how moving it helps? If you moved it to libtcmu would you add a new AddDevice dbus operation, like CheckConfig, which is used by apps like targetcli to talk to specific daemons?

I think if we continue to use netlink we want to fix the tools/daemons/kernels so netlink events are sent to specific daemons or at least add a field in the netlink operation that indicates which daemon the command is for.

mikechristie commented 5 years ago

Is this a similar problem as with containers? We could have tcmu-runner or daemon123 running in multiple containers. We only want a specific instance to handle the netlink request. Currently, because we are just multicasting to all listeners.

baihuahua commented 5 years ago

It's not clear to me how dbus comes into the problem or how moving it helps?

I misunderstood them, please forget about the dbus moving thing to fix this issue.

I think if we continue to use netlink we want to fix the tools/daemons/kernels so netlink events are sent to specific daemons or at least add a field in the netlink operation that indicates which daemon the command is for.

Yeah, i agree about this. Which one do you think is more easier to implement, changing multicasting or adding a field in netlink operation?

baihuahua commented 5 years ago

Is this a similar problem as with containers? We could have tcmu-runner or daemon123 running in multiple containers. We only want a specific instance to handle the netlink request. Currently, because we are just multicasting to all listeners.

Yeah, they're similar but not exactly same. I think the container case needs one way for kernel to distinguish daemons residing in different containers at the container level while this one is more about system level. This issue can be solved through one of ways above while the container one should for kernel to distinguish different containers, right?

mikechristie commented 5 years ago

For both questions I'm not completely sure.

I think this is one item you can research and drive the solution and have freedom and fun with the fix if you want.

baihuahua commented 5 years ago

Ok, i'll see more about this. Thanks Mike.

liulanzheng commented 3 years ago

Is there any good solutions now?