Open baihuahua opened 5 years ago
It's not clear to me how dbus comes into the problem or how moving it helps? If you moved it to libtcmu would you add a new AddDevice dbus operation, like CheckConfig, which is used by apps like targetcli to talk to specific daemons?
I think if we continue to use netlink we want to fix the tools/daemons/kernels so netlink events are sent to specific daemons or at least add a field in the netlink operation that indicates which daemon the command is for.
Is this a similar problem as with containers? We could have tcmu-runner or daemon123 running in multiple containers. We only want a specific instance to handle the netlink request. Currently, because we are just multicasting to all listeners.
It's not clear to me how dbus comes into the problem or how moving it helps?
I misunderstood them, please forget about the dbus moving thing to fix this issue.
I think if we continue to use netlink we want to fix the tools/daemons/kernels so netlink events are sent to specific daemons or at least add a field in the netlink operation that indicates which daemon the command is for.
Yeah, i agree about this. Which one do you think is more easier to implement, changing multicasting or adding a field in netlink operation?
Is this a similar problem as with containers? We could have tcmu-runner or daemon123 running in multiple containers. We only want a specific instance to handle the netlink request. Currently, because we are just multicasting to all listeners.
Yeah, they're similar but not exactly same. I think the container case needs one way for kernel to distinguish daemons residing in different containers at the container level while this one is more about system level. This issue can be solved through one of ways above while the container one should for kernel to distinguish different containers, right?
For both questions I'm not completely sure.
I think this is one item you can research and drive the solution and have freedom and fun with the fix if you want.
Ok, i'll see more about this. Thanks Mike.
Is there any good solutions now?
When tcmu-runner is running with other tcmu daemon like qemu-tcmu at the same time on a host with genl version setting 2 in kernel, creating target through qemu-tcmu will fail with message in tcmu-runner's log file:
This's because tcmu-runner and qemu-tcmu both receive the device add event from netlink, while tcmu-runner can't find a handler to handle the device so it fails the creation by returning the kernel error code.
To settle this problem, tcmu-runner needs a way to give other tcmu daemon a chance to handle the device creation before failing it. Or we can lower down dbus from tcmu-runner into libtcmu so all daemons can acquire dbus on their own without dependence.