gustavo-iniguez-goya / opensnitch

OpenSnitch is a GNU/Linux application firewall
GNU General Public License v3.0
395 stars 20 forks source link

A question about unrecognized connections #84

Open themighty1 opened 4 years ago

themighty1 commented 4 years ago

@gustavo-iniguez-goya, I remember reading in other github issues that you've done a lot of research into this area of trying to pin down where the unrecognized connections come from.

Have you seen connections which the first time around are not found via netlink nor via tcp netstat, but if you loop again they will be found via netlink or netstat? Sort of like a delay in the kernel to update its netlink tables or something?

Or has it always been the case that if the connection is not found on the first netlink/netstat iteration, it means that's the end of it?

Just throwing some ideas around. Maybe you already know these things, so it's quicker to ask you than to test whether it is the case.

gustavo-iniguez-goya commented 4 years ago

Have you seen connections which the first time around are not found via netlink nor via tcp netstat, but if you loop again they will be found via netlink or netstat?

No that I can remember of. Those connections could be of a forwarded connection traversing the box, but we wouldn't intercept it anyway, because we only intercept NEW and RELATED connections. Also broadcast/multicast connections are a bit special (differ a bit as seen by iptables and netlink).

There're at least two situations though where this can happen, and is when you come back from suspend the system or when you dis/connect to a wifi network. Usually the opened connections are in an invalid state, and the processes start closing/reestablishing them. Adding the state RELATED helped to identify some of these connections.

Or has it always been the case that if the connection is not found on the first netlink/netstat iteration, it means that's the end of it?

If the PID of the process who created the connection is not found on the first iteration, it could mean that the PID is in reality a TID, i.e.: the connection was opened by a thread of the process.

As we don't parse /proc/<PID>/task/ directory we don't look there for the inode of the connection, so we don't find the cmdline nor the PID. This is mainly the reason for many "unknown connections", it's very costly to parse all TIDs.

Another case is when the process is a forked child, for example those launched by systemd. As far as I can remember, in this case the reported PID was of the systemd.

gustavo-iniguez-goya commented 4 years ago

Regarding the last case, for example fwupdmgr launched by systemd (fwupd-refresh systemd service) is not detected (using proc method):

[2020-11-02 22:09:31]  DBG  new connection tcp => 55732:192.168.1.101 -> 151.101.122.49:443 uid: %!(EXTRA uint32=1000)
[2020-11-02 22:09:31]  DBG  [0/1] outgoing connection: 55732:192.168.1.101 -> 151.101.122.49:443 || netlink response: 55732:192.168.1.101 -> 151.101.122.49:443 inode: 50975687 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true 
[2020-11-02 22:09:32]  DBG  new pid lookup took%!(EXTRA int=-1, time.Duration=731.053526ms)

netlink dumps correctly the inode of the connection but the PID is not found. Using audit is more likely to have success.

If you launch it manually then yes fwupdmgr refresh --force .

I'm wondering if will have anything to do with the systemd sandboxing options.

themighty1 commented 4 years ago

Thanks for the insights. Clarified a lot. I added a loop to the opensnitch code to look up netlink/netstat again. Alas, the code was looping forever - no inodes were found. This happens to very few connections when I add a new torrent to transmission.

DBG new connection tcp => 45017: -> :27359 uid: %!(EXTRA uint32=4294967295) DBG netlink socket error: Warning, no message nor error from netlink - 45017: -> :27359 DBG Searching for tcp6 netstat entry instead of tcp DBG <== no inodes found, applying default action.

Have you seen this before? What do you think the cause may be? I know that transmission may act as a server, but in my case this was a connection from my IP to dest IP.

gustavo-iniguez-goya commented 4 years ago

Yeah, with transmission is fairly common to see those messages.

I have no idea really, but it could be failed connection attempts:

20468 1604450963.852980 connect(24, {sa_family=AF_INET, sin_port=htons(53998), sin_addr=inet_addr("87.173.23.163")}, 16) = -1 EINPROGRESS (Operation in progress)
20468 1604450963.854008 setsockopt(24, SOL_IP, IP_TOS, [0], 4) = 0
20468 1604450963.857135 close(24)       = 0

maybe it happens so fast that when we query for it the kernel has already deleted the entry.

I wrote some words regarding these issues here https://github.com/gustavo-iniguez-goya/opensnitch/issues/10#issuecomment-608428026 and here https://github.com/gustavo-iniguez-goya/opensnitch/issues/10#issuecomment-608436200

gustavo-iniguez-goya commented 4 years ago

Some more examples for future reference:

[2020-11-04 10:35:31]  DBG  new connection tcp => 45327:192.168.1.101 -> 188.64.117.35:51413 uid: %!(EXTRA uint32=1000)
[2020-11-04 10:35:31]  DBG  [0/1] outgoing connection: 45327:192.168.1.101 -> 188.64.117.35:51413 || netlink response: 45327:192.168.1.101 -> 188.64.117.35:51413 inode: 1606906 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true 
pkt.queue:  0
[2020-11-04 10:35:32]  DBG  new pid lookup took%!(EXTRA int=-1, time.Duration=678.414889ms)
[2020-11-04 10:35:32]  IMP  Added new rule: allow if dest.ip is '188.64.117.35'
[2020-11-04 10:35:32]  DBG  ✔  -> 188.64.117.35:51413 (allow-30s-simple-1886411735)

So in this case, we should be able to find the PID. More than not finding the PID, in this case what intrigues me is why auditd does not detect it, or well, it's probably detecting it but for some reason we're not parsing the event correctly. I should probably analyze auditd logs as well.

themighty1 commented 4 years ago

Just want to report that I ran an endless loop of dumping all TCP connections via netlink with NLM_F_DUMP and all TCP stated (mask 0xfff) while adding a new torrent to transmission.

Only for ~ 1/5 unknown connections I would find the source port of the unknown connection in my netlink dump. I did this for a sanity check. At least now I have some confidence that transmission's quick connect/close is reflected in netlink momentarily. By the time opensnitch sends a request to netlink, the entry is no longer there.

Unfortunately netlink doesn't provide a way to subscribe to new events for inet sockets, we can only poll it periodically.

gustavo-iniguez-goya commented 3 years ago

The key here would be to use eBPF if it's available: https://github.com/iovisor/bcc/blob/master/tools/tcplife.py

There's a fork who integrated it in opensnitch, maybe we can reuse it. On the other hand, ideally we should have to use XDP to block connections. But it's a faily new feature and it's not available in many systems.

gustavo-iniguez-goya commented 3 years ago

connections which the first time around are not found via netlink nor via tcp netstat

I have no idea really, but it could be failed connection attempts:

Correct, in particular non-blocking failed connection attempts. Sometimes the connection will be discarded because it's not found via netlink, and other times will pass all the checks until it fails to retrieve the PID of the process.

With this example you can reproduce the issue (port and ip captured sniffing Transmission traffic):

  // gcc cclient.c -o cclient

  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <fcntl.h>
  #include <errno.h>
  #include <string.h>
  #include <netdb.h>
  #include <sys/types.h>
  #include <netinet/in.h>
  #include <sys/socket.h>
  #include <arpa/inet.h>

  #define PORT 28979

  int main(int argc, char *argv[])
  {
      int sockfd;
      struct sockaddr_in their_addr;

      if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
          perror("socket()");
          exit(1);
      } else {
          printf("Client socket() OK...\n");
          if ((fcntl(sockfd, F_SETFL, O_NONBLOCK) < 0))
              perror("setsockopt failed\n");
      }

      their_addr.sin_family = AF_INET;
      their_addr.sin_port = htons(PORT);
      inet_aton("5.180.62.91", &their_addr.sin_addr);
      memset(&(their_addr.sin_zero), '\0', 8);

      if(connect(sockfd, (struct sockaddr *)&their_addr, sizeof(struct sockaddr)) == -1) {
          perror("connect() error");
          exit(1);
      } else
          printf("Client connect() is OK...\n");

      close(sockfd);
      return 0;
  }
themighty1 commented 3 years ago

Nice, thank you.

gustavo-iniguez-goya commented 3 years ago

Regarding this problem, I've modified the ftrace monitor method to hook tcp/tcp_destroy_sock and sock/inet_sock_set_state instead of sched/sched_process_exec and sched/sched_process_fork

The benefits of doing this is that we only cache and intercept PIDs that have created network activity, instead of cache every single execution of a process in the system. If we wanted to monitor whenever a new process is launched, we should do it via netlink (PROC_EVENT_EXEC, PROC_EVENT_FORK, PROC_EVENT_EXIT), to not rely on debugfs.

On the other hand, inet_sock_set_state logs the source/destination port and IPs of new connections along with the PID of the process, so we can match new outgoing connections with this data:

new outgoing connection: 192.168.1.134:51413 -> 47.188.48.32:57949

inet_sock_set_state: ADD: pid:22825 inet_sock_set_state -> map[daddr:47.188.48.32 daddrv6:::ffff:47.188.48.32 dport:57949 family:AF_INET oldstate:TCP_CLOSE protocol:IPPROTO_TCP saddr:192.168.1.134 saddrv6:::ffff:192.168.1.134 sport:51413] Key: 192.168.1.134:51413 47.188.48.32:57949

It's not bulletproof. Sometimes the source port is 0 (probably when connection fails to establish), so the new outgoing connection doesn't match. But still, it seems to work way better than the current method.