jbaublitz / neli

Rust type safe netlink library
BSD 3-Clause "New" or "Revised" License
180 stars 35 forks source link

Route multicast events (with `nl_pid > 0`) are ignored #218

Closed acheronfail closed 1 year ago

acheronfail commented 1 year ago

I'm trying to make something that functions similar to nl-monitor ipv4-ifaddr with neli, but I'm struggling to get anywhere with it.

This is my current attempt:

let (socket, mut multicast) =
    NlRouter::connect(NlFamily::Route, None, Groups::empty()).unwrap();

socket
    .add_mcast_membership(Groups::new_groups(&[RTNLGRP_IPV4_IFADDR]))
    .unwrap();

match multicast.next() {
    None => unreachable!(),
    Some(response) => {
        dbg!(response).unwrap();
    }
}

As far as I can tell, this should be reporting events any time an ipv4 address on the machine changes, but I get no output at all with this setup. I've been looking at how nl-monitor works, and comparing that to neli and they look very similar here, so I'm not sure what's different...

Any chance you might know where I should look next? :pray:

acheronfail commented 1 year ago

Just used strace and I can see that messages are coming through, it's just that multicast.next() never returns.

recvfrom(9, [
    {
        nlmsg_len=80,
        nlmsg_type=RTM_NEWADDR,
        nlmsg_flags=0,
        nlmsg_seq=1688203532,
        nlmsg_pid=8187
    },
    {
        ifa_family=AF_INET,
        ifa_prefixlen=32,
        ifa_flags=IFA_F_PERMANENT,
        ifa_scope=RT_SCOPE_UNIVERSE,
        ifa_index=if_nametoindex("wlan0")
    },
    [
        [
            {
                nla_len=8,
                nla_type=IFA_ADDRESS
            },
            inet_addr("10.0.0.254")
        ],
        [
            {
                nla_len=8,
                nla_type=IFA_LOCAL
            },
            inet_addr("10.0.0.254")
        ],
        [
            {
                nla_len=10,
                nla_type=IFA_LABEL
            },
            "wlan0"
        ],
        [
            {
                nla_len=8,
                nla_type=IFA_FLAGS
            },
            IFA_F_PERMANENT
        ],
        [
            {
                nla_len=20,
                nla_type=IFA_CACHEINFO
            },
            {
                ifa_prefered=4294967295,
                ifa_valid=4294967295,
                cstamp=38685,
                tstamp=38685
            }
        ]
    ]
], 32768, 0, NULL, NULL) = 80

Is this because I need a specific type before neli will parse it and return it? If so, is there a way to always return any message? (Or alternatively, how do I find out the correct neli type?)

acheronfail commented 1 year ago

I'll update this with any further findings...

Debugging Tips

Confusing things...

Findings

My current findings are that the senders collection here is always empty, and the message's pid is non-zero, and so although neli receives and parses the message, it doesn't send it back to the multicast receiver in my example.

This is a combination of nl_pid != 0 and also nl_seq = <some very high number, like 1688206355>.

So, for some reason - messages received here don't have nl_pid = 0. This means that neli doesn't forward those events to the multicast_receiver, because it seems to only forward events with nl_pid = 0 to the multicast_receiver. Any other message received on the socket is simply ignored and dropped.

Is this ignoring of events intended behaviour? :question:

acheronfail commented 1 year ago

I can confirm that nl-monitor ipv4-ifaddr also receives events on its equivalent of a multicast received with nl_pid != 0 and nl_seq = <random high number>. No wait, I was confused! strace -ff nl-monitor ipv4-ifaddr shows that these messages are received with nl_pid == 0 - so something mustn't be set right with neli...

I've changed the title - I think this should be updated to a feature perhaps? (No permissions to change label...)

EDIT:

I created https://github.com/jbaublitz/neli/pull/219 in an attempt to fix this.

acheronfail commented 1 year ago

Alright, the whole thing is working (provided my fork that's in https://github.com/jbaublitz/neli/pull/219 is used).

Here's some sample code:

// setup socket for netlink route
let (socket, mut multicast) =
    NlRouter::connect(NlFamily::Route, None, Groups::empty()).unwrap();

// add multicast membership for ipv4-addr updates
socket
    .add_mcast_membership(Groups::new_groups(&[RTNLGRP_IPV4_IFADDR]))
    .unwrap();

// listen for multicast events
// NOTE: currently requires the changes here: https://github.com/jbaublitz/neli/pull/219
type Next = Option<Result<Nlmsghdr<u16, Ifaddrmsg>, RouterError<u16, Ifaddrmsg>>>;
match multicast.next_typed::<u16, Ifaddrmsg>() as Next {
    None => todo!(),
    // we got a multicast message
    Some(response) => {
        // if there are errors on the multicast channel, they'll be here in this result
        let response = response.unwrap();
        // get message payload
        let ifaddr_msg = response.get_payload().unwrap();
        // get a handle to the message's rt attributes
        let rt_attrs_handle = ifaddr_msg.rtattrs().get_attr_handle();
        // get the address attribute
        let addr_attr = rt_attrs_handle.get_attribute(Ifa::Address).unwrap();
        // convert the raw bytes from the attribute into an `Ipv4Addr` struct
        let bytes: &[u8] = addr_attr.rta_payload().as_ref();
        let bytes: &[u8; 4] = bytes.try_into().unwrap();
        let ipv4 = Ipv4Addr::from(*bytes);
        // 🎉 we did it!
        dbg!(ipv4);
    }
}

I'm leaving this issue open as the tracking issue for ignored multicast events.

acheronfail commented 1 year ago

Wait - I'm so sorry for all the spam :sweat_smile: - after looking at this again I seem to have completely glossed over the fact that nl-monitor receives events with nl_pid == 0 but when I try with neli I get events with nl_pid > 0!

So, the PR I created is probably bogus - these should be multicast events... but why don't the events come through with nl_pid == 0 when I subscribe with neli??? This has got me so confused...

Again, this issue is now more or less a diary of my experience learning about netlink :sweat_smile:.

I think I was originally right, actually. The thing that confused me is reading strace's output of the recvmsg calls:

recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000010}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=80, nlmsg_type=RTM_DELADDR, nlmsg_flags=0, nlmsg_seq=1688293504, nlmsg_pid=19010}, {ifa_family=AF_INET, ifa_prefixlen=32, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("wlan0")}, [[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("10.0.0.254")], [{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("10.0.0.254")], [{nla_len=10, nla_type=IFA_LABEL}, "wlan0"], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=142503, tstamp=142503}]]], iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 80

I was confusing the first part of the recvmsg call...

{msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000010}

...thinking that contained the actual netlink message header - but it doesn't! The actual header is this part:

{nlmsg_len=80, nlmsg_type=RTM_DELADDR, nlmsg_flags=0, nlmsg_seq=1688293504, nlmsg_pid=19010}

So, I believe my previous comments about multicast messages with nl_pid > 0 are correct.

jbaublitz commented 1 year ago

@acheronfail Can you test #209 and let me know if that resolves the issue. Someone else suggested that I use recvfrom instead of recv to determine whether a message is coming from a netlink multicast group or not. Based on my initial testing, it seems to resolve the problem of heuristics. Can you please confirm that it resolves your issue too?

acheronfail commented 1 year ago

Ah yes! Thank you so much, I was going around in circle so many times :sweat_smile:

I can confirm that works for me!