Open themighty1 opened 4 years ago
Have you seen connections which the first time around are not found via netlink nor via tcp netstat, but if you loop again they will be found via netlink or netstat?
No that I can remember of. Those connections could be of a forwarded connection traversing the box, but we wouldn't intercept it anyway, because we only intercept NEW and RELATED connections. Also broadcast/multicast connections are a bit special (differ a bit as seen by iptables and netlink).
There're at least two situations though where this can happen, and is when you come back from suspend the system or when you dis/connect to a wifi network. Usually the opened connections are in an invalid state, and the processes start closing/reestablishing them. Adding the state RELATED helped to identify some of these connections.
Or has it always been the case that if the connection is not found on the first netlink/netstat iteration, it means that's the end of it?
If the PID of the process who created the connection is not found on the first iteration, it could mean that the PID is in reality a TID, i.e.: the connection was opened by a thread of the process.
As we don't parse /proc/<PID>/task/
directory we don't look there for the inode of the connection, so we don't find the cmdline
nor the PID. This is mainly the reason for many "unknown connections", it's very costly to parse all TIDs.
Another case is when the process is a forked child, for example those launched by systemd. As far as I can remember, in this case the reported PID was of the systemd.
Regarding the last case, for example fwupdmgr launched by systemd (fwupd-refresh systemd service) is not detected (using proc method):
[2020-11-02 22:09:31] DBG new connection tcp => 55732:192.168.1.101 -> 151.101.122.49:443 uid: %!(EXTRA uint32=1000)
[2020-11-02 22:09:31] DBG [0/1] outgoing connection: 55732:192.168.1.101 -> 151.101.122.49:443 || netlink response: 55732:192.168.1.101 -> 151.101.122.49:443 inode: 50975687 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true
[2020-11-02 22:09:32] DBG new pid lookup took%!(EXTRA int=-1, time.Duration=731.053526ms)
netlink dumps correctly the inode of the connection but the PID is not found. Using audit is more likely to have success.
If you launch it manually then yes fwupdmgr refresh --force
.
I'm wondering if will have anything to do with the systemd sandboxing options.
Thanks for the insights. Clarified a lot. I added a loop to the opensnitch code to look up netlink/netstat again. Alas, the code was looping forever - no inodes were found. This happens to very few connections when I add a new torrent to transmission.
DBG new connection tcp => 45017:
Have you seen this before? What do you think the cause may be? I know that transmission may act as a server, but in my case this was a connection from my IP to dest IP.
Yeah, with transmission is fairly common to see those messages.
I have no idea really, but it could be failed connection attempts:
20468 1604450963.852980 connect(24, {sa_family=AF_INET, sin_port=htons(53998), sin_addr=inet_addr("87.173.23.163")}, 16) = -1 EINPROGRESS (Operation in progress)
20468 1604450963.854008 setsockopt(24, SOL_IP, IP_TOS, [0], 4) = 0
20468 1604450963.857135 close(24) = 0
maybe it happens so fast that when we query for it the kernel has already deleted the entry.
I wrote some words regarding these issues here https://github.com/gustavo-iniguez-goya/opensnitch/issues/10#issuecomment-608428026 and here https://github.com/gustavo-iniguez-goya/opensnitch/issues/10#issuecomment-608436200
Some more examples for future reference:
[2020-11-04 10:35:31] DBG new connection tcp => 45327:192.168.1.101 -> 188.64.117.35:51413 uid: %!(EXTRA uint32=1000)
[2020-11-04 10:35:31] DBG [0/1] outgoing connection: 45327:192.168.1.101 -> 188.64.117.35:51413 || netlink response: 45327:192.168.1.101 -> 188.64.117.35:51413 inode: 1606906 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true
pkt.queue: 0
[2020-11-04 10:35:32] DBG new pid lookup took%!(EXTRA int=-1, time.Duration=678.414889ms)
[2020-11-04 10:35:32] IMP Added new rule: allow if dest.ip is '188.64.117.35'
[2020-11-04 10:35:32] DBG ✔ -> 188.64.117.35:51413 (allow-30s-simple-1886411735)
ipv4 2 tcp 6 117 SYN_SENT src=192.168.1.101 dst=188.64.117.35 sport=45327 dport=51413 [UNREPLIED] src=188.64.117.35 dst=192.168.1.101 sport=51413 dport=45327 mark=0
644 14.864217873 192.168.1.101 188.64.117.35 UDP 72 51413 → 51413 Len=30
1382 17.847181084 192.168.1.101 188.64.117.35 UDP 72 51413 → 51413 Len=30
18961 24.594680726 192.168.1.101 188.64.117.35 TCP 74 45327 → 51413 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=350883110 TSecr=0 WS=1024
So in this case, we should be able to find the PID. More than not finding the PID, in this case what intrigues me is why auditd does not detect it, or well, it's probably detecting it but for some reason we're not parsing the event correctly. I should probably analyze auditd logs as well.
Just want to report that I ran an endless loop of dumping all TCP connections via netlink with NLM_F_DUMP and all TCP stated (mask 0xfff) while adding a new torrent to transmission.
Only for ~ 1/5 unknown connections I would find the source port of the unknown connection in my netlink dump. I did this for a sanity check. At least now I have some confidence that transmission's quick connect/close is reflected in netlink momentarily. By the time opensnitch sends a request to netlink, the entry is no longer there.
Unfortunately netlink doesn't provide a way to subscribe to new events for inet sockets, we can only poll it periodically.
The key here would be to use eBPF if it's available: https://github.com/iovisor/bcc/blob/master/tools/tcplife.py
There's a fork who integrated it in opensnitch, maybe we can reuse it. On the other hand, ideally we should have to use XDP to block connections. But it's a faily new feature and it's not available in many systems.
connections which the first time around are not found via netlink nor via tcp netstat
I have no idea really, but it could be failed connection attempts:
Correct, in particular non-blocking failed connection attempts. Sometimes the connection will be discarded because it's not found via netlink, and other times will pass all the checks until it fails to retrieve the PID of the process.
With this example you can reproduce the issue (port and ip captured sniffing Transmission traffic):
// gcc cclient.c -o cclient
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <netdb.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#define PORT 28979
int main(int argc, char *argv[])
{
int sockfd;
struct sockaddr_in their_addr;
if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
perror("socket()");
exit(1);
} else {
printf("Client socket() OK...\n");
if ((fcntl(sockfd, F_SETFL, O_NONBLOCK) < 0))
perror("setsockopt failed\n");
}
their_addr.sin_family = AF_INET;
their_addr.sin_port = htons(PORT);
inet_aton("5.180.62.91", &their_addr.sin_addr);
memset(&(their_addr.sin_zero), '\0', 8);
if(connect(sockfd, (struct sockaddr *)&their_addr, sizeof(struct sockaddr)) == -1) {
perror("connect() error");
exit(1);
} else
printf("Client connect() is OK...\n");
close(sockfd);
return 0;
}
Nice, thank you.
Regarding this problem, I've modified the ftrace monitor method
to hook tcp/tcp_destroy_sock
and sock/inet_sock_set_state
instead of sched/sched_process_exec
and sched/sched_process_fork
The benefits of doing this is that we only cache and intercept PIDs that have created network activity, instead of cache every single execution of a process in the system. If we wanted to monitor whenever a new process is launched, we should do it via netlink (PROC_EVENT_EXEC, PROC_EVENT_FORK, PROC_EVENT_EXIT), to not rely on debugfs
.
On the other hand, inet_sock_set_state
logs the source/destination port and IPs of new connections along with the PID of the process, so we can match new outgoing connections with this data:
new outgoing connection:
192.168.1.134:51413 -> 47.188.48.32:57949
inet_sock_set_state:
ADD: pid:22825 inet_sock_set_state -> map[daddr:47.188.48.32 daddrv6:::ffff:47.188.48.32 dport:57949 family:AF_INET oldstate:TCP_CLOSE protocol:IPPROTO_TCP saddr:192.168.1.134 saddrv6:::ffff:192.168.1.134 sport:51413] Key: 192.168.1.134:51413 47.188.48.32:57949
It's not bulletproof. Sometimes the source port is 0 (probably when connection fails to establish), so the new outgoing connection doesn't match. But still, it seems to work way better than the current method.
@gustavo-iniguez-goya, I remember reading in other github issues that you've done a lot of research into this area of trying to pin down where the unrecognized connections come from.
Have you seen connections which the first time around are not found via netlink nor via tcp netstat, but if you loop again they will be found via netlink or netstat? Sort of like a delay in the kernel to update its netlink tables or something?
Or has it always been the case that if the connection is not found on the first netlink/netstat iteration, it means that's the end of it?
Just throwing some ideas around. Maybe you already know these things, so it's quicker to ask you than to test whether it is the case.