jow- / nlbwmon

Simple conntrack based traffic accounting
ISC License
158 stars 37 forks source link

a couple of bugs #57

Closed a7ypically closed 9 months ago

a7ypically commented 1 year ago

I've been using this code base to create something more comprehensive (and currently annoying) as a network monitor for my personal use. Basically sending push notifications when some IOT device sends data to an unexpected country or just sends a lot of data which I don't expect it to.

In the process of hacking this code, I found two major bugs which I think are worth fixing. Unfortunately I don't have a pull request but these are easy to fix:

  1. I think nfnetlink_dump() does not work as intended.
for (err = 1; err > 0; ) {
    ret = nl_recvmsgs(nl, cb);
    if (ret == 0) {
        err = 0;
        break;
    }
    else if (ret < 0) {
        fprintf(stderr, "Netlink receive failure: %s\n", nl_geterror(ret));
        err = (-ret == NLE_NOMEM) ? -ENOBUFS : -EIO;
        break;
    }
}

nl_recvmsgs() returns 0 when successful so this loop only runs once. Thus this dump only processes the first entry. It is necessary to keep calling nl_recvmsgs() until the done / error callback are called which will set the err value and exit the loop (or if ret < 0).

I have also changed NL_SKIP to NL_OK in handle_dump().

I think it's also worth running nfnetlink_dump(allow_insert=true) when the daemon starts, otherwise existing connections started before the daemon will not be added to the DB (which will only try to update the DB and fail).

  1. There are some avl_tree cmp functions returning a direct subtraction of uint32_t which are used for the result of the signed int cmp function. So (uint32_t)3000000000 - (uint32_t)1 will return a negative value when the function returns.

Also note that (uint32_t)1 - (uint32_t)2 does not return a negative value although by casting it back to (signed int) the result will be negative. memcmp(a->u32, b->u32, sizeof(a->u32)) works fine for all cases (at least for host order values which most of the values are).

jow- commented 1 year ago

Thanks for the report!

Regarding point 1:

nl_recvmsgs() returns 0 when successful so this loop only runs once

The documentation here https://www.infradead.org/~tgr/libnl/doc/api/group__send__recv.html#gabb1917e38174d5b00cb313420327c495 states:

Repeatedly calls nl_recv() [...] and parses the received data as netlink messages. Stops reading if one of the callbacks returns NL_STOP or nl_recv returns either 0 or a negative error code.

So nl_recvmsgs() will already loop itself internally. The way I understand its implementation, a return value of zero essentially means "no more data to read from socket". By repeating the receive operation, we might just end up in a blocking read operation.

Edit: looking at it again, it indeed seems to only process one logical (multipart) message which may consist of several partial messages, but it will not read as much as it can. Will revisit the logic.

Regarding point 2:

There are some avl_tree cmp functions returning a direct subtraction of uint32_t which are used for the result of the signed int cmp function

Indeed, I found one occurrence in avl_cmp_neigh(), are there more instances I overlooked?

a7ypically commented 1 year ago

Regarding 1 As you noted it is required to call nl_recvmsgs() multiple times. If all is ok it should end by either the finish or error callbacks being called which both set the err variable and will exit the loop. Moreover nl_recv() sets the socket to non-blocking so I don't think the loop can block. Worst case it will return an error and will exit the loop again. Also note that I think you should change skip to ok at the end of the parse callback. I have made these changes on my branch and confirmed that they work well and dump now processes all entries in the conntrack. You can see it here: https://github.com/a7ypically/nlbwmon

Regarding point 2. Unfortunately my branch has diverted and is using several new avl trees so it's hard for me to verify that the only original issue is in the neigh code. If I do remember correctly it was only this one.

More about the changes I have done in my branch are here: https://forum.openwrt.org/t/network-monitor-track-your-network-devices-iot-devices-send-large-amount-of-data-outside-connect-to-servers-in-unexpected-countries/172431