vishvananda / netlink

Simple netlink library for go.
Apache License 2.0
2.86k stars 747 forks source link

Wrong sender portid 3034, expected 0 #786

Open DreamerKMP opened 2 years ago

DreamerKMP commented 2 years ago

Hi!

I recently ran into a problem when calling the netlink.NeighSubscribeWithOptions() and netlink.LinkSubscribeWithOptions() functions from multiple go-routines.

The code causing the problem is below.

if from.Pid != nl.PidKernel {
  if cberr != nil {
    cberr(fmt.Errorf("Wrong sender portid %d, expected %d", from.Pid, nl.PidKernel))
  }
  continue
}

Actually I checked the message through strace.

[pid  3039] <... recvfrom resumed>{{len=32, type=RTM_GETNEIGH, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=13, pid=0}, {ifi_family=AF_BRIDGE, ifi_type=ARPHRD_NETROM, ifi_index=0, ifi_flags=0, ifi_change=0}}, 65536, 0, {sa_family=AF_NETLINK, nl_pid=3034, nl_groups=0x000004}, [112->12]) = 32
...
[pid  3039] write(1, "{\"Target Network Interface\":\"tes"..., 184{"Target Network Interface":"testeth0","error":"Wrong sender portid 3034, expected 0","level":"error","msg":"NeighSubscribeWithOptions error found","time":"2022-07-21T16:57:51+09:00"}

If you look at the message, you can see that nl_pid has a different value (id of another thread) than the value of PidKernel. So the preceding code caused the error.

I think that code is unnecessary. Please review this.

stv0g commented 1 year ago

I can reproduce the issue in my code using netlink.RouteSubscribeWithOptions()

aboch commented 1 year ago

@DreamerKMP @stv0g please feel free to open a pull request with your proposed fix

stv0g commented 1 year ago

The netlink(7) man-page describes the purpose of nl_pid as follows:

nl_pid is the unicast address of netlink socket. It's always 0 if the destination is in the kernel. For a user-space process, nl_pid is usually the PID of the process owning the destination socket. However, nl_pid identifies a netlink socket, not a process. If a process owns several netlink sockets, then nl_pid can be equal to the process ID only for at most one socket. There are two ways to assign nl_pid to a netlink socket. If the application sets nl_pid before calling bind(2), then it is up to the application to make sure that nl_pid is unique. If the application sets it to 0, the kernel takes care of assigning it. The kernel assigns the process ID to the first netlink socket the process opens and assigns a unique nl_pid to every netlink socket that the process subsequently creates.

stv0g commented 1 year ago

The check occurs at several places:

denglunwen commented 3 days ago

Did this issue solved? And whether the ProcEvent which 'fromPid != nl.PidKernel' can be used for process detection?