imsnif / bandwhich

Terminal bandwidth utilization tool
MIT License
9.87k stars 290 forks source link

Electron processes shown as UNKNOWN #196

Open da2x opened 3 years ago

da2x commented 3 years ago

Every Electron ap is shown as “UNKNOWN”. I’m not sure about the details. I don’t know how bandwhich determines the process names, but utilities like top and ps shows Electron process names just fine.

To reproduce:

  1. Start bandwhich
  2. Install any Electron app and run it (e.g. Joplin, Spotify, Bitwarden)

Edit: This issue previously misidentified the cause as Flatpak. However, it turns out the root cause seems to be Electron apps (either running in Flatpak or unconstrained).

grahamperrin commented 1 year ago

See also:

cyqsimon commented 11 months ago

Hi all. I recently added the relevant logging to be able to diagnose this issue, and had some curious discoveries.

First, there is this thing known as "IPv4-mapped IPv6 addresses" (Wikipedia), which we have previously failed to ugh... address. For example:

09:48:28 [WARN] "kdeconnectd" owns a similar looking connection, but its local ip doesn't match.
09:48:28 [WARN] Looking for: tcp://w.x.y.z:37922; found: tcp://[::ffff:w.x.y.z]:37922

I have now patched this in 76956cf. I suspect there is a decent probability that this was the original root cause behind this particular bug report.


However, I am still seeing some orphan connections in the logs, which I would like to investigate further. And it would be great if you can help provide some data points.

Can you please:

  1. Build and run main branch locally, or download the latest CI build here (don't worry about the test failures; they are known and benign)
  2. Run bandwhich with the --log-to option
  3. Run another connection monitoring application (ss, lsof, etc.), and try to correlate the warnings in bandwhich's logs to processes

From what I can see, many of these remaining connections are very transient in nature, so trying to eyeball it might be somewhat difficult. You may wish to pipe the output of your alternate connection monitoring application to a file, and compare them afterwards.

Thank you all in advance.

mercurytoxic commented 10 months ago

I think it would helpful if the source and destination ip addresses and ports are inluded in the bandwhich log. With only one side of the connection is hard to figure out what program a connection belongs to.

bandwhich.log ss.log ip a s

cyqsimon commented 10 months ago

@mercurytoxic suggestion implemented on unknown-processes branch. Can you please perform this debugging procedure again?

As per usual, you can pull and build locally, or you can download the latest CI build here (once all jobs are finished).

cyqsimon commented 10 months ago

I've done a bit of investigation myself. And also now that I'm a bit more familiar with the codebase in general, I feel I can speculate about the root cause of this issue with a bit more certainty.

As I previously said, it seems like the remaining <UNKNOWN> connections are very transient in nature. I think I now understand why. The traffic capture (termed "sniffing" in code) happens continuously in separate threads (one thread for each interface), but the "process resolution" only happens every 1 second, before each render event. This is obviously problematic because it's perfectly likely that a connection has been dropped during this sub-1-second interval.

The ideal solution is to move the process resolution into the sniffer threads, although that will most certainly cause some performance impact due to it being executed much more frequently. This is something that I would have to experiment and benchmark.

But meanwhile, I recognise that this transience problem, while having the same symptoms as the reported issue, is of a complete different cause. Knowing the likely cause of the remaining <UNKNOWN> connections, I am now rather confident that the original reported issue was indeed caused by bandwhich not knowing about "IPv4-mapped IPv6 addresses", which I have patched. So I would like to close this issue now, and work on the aforementioned refactor at a later date.

What I am uncertain about however, is whether this is indeed the same problem encountered by the original reporter of this problem on FreeBSD. From his screenshot (in FreeBSD's bug tracker), it seems like his situation was a lot worse - no connections were being resolved to processes at all. This is mildly concerning - there is a small chance that something is critically broken for BSD builds. So I would like to ask @grahamperrin to test and see if the symptoms have improved for him or not before closing this issue. Thank you all for all the help.

Edit: s/OpenBSD/FreeBSD/. Sorry I have a tendency to confuse similar-looking things that I don't personally use.

mclang commented 5 months ago

Maybe this help someone.

I was using bandwhich --processes to monitor my network and noticed the ever-staying process. It wasn't transient in any way but stayed at the top of the list all the time. Long story short, after a while I started to test different interfaces and found out that the one uses my wireless interface and connects to certain remote IP/PORT... Which is my ProtonVPN Wireguard connection.

Stupid me :blush:

cyqsimon commented 5 months ago

Maybe this help someone.

I was using bandwhich --processes to monitor my network and noticed the ever-staying process. It wasn't transient in any way but stayed at the top of the list all the time. Long story short, after a while I started to test different interfaces and found out that the one uses my wireless interface and connects to certain remote IP/PORT... Which is my ProtonVPN Wireguard connection.

Stupid me :blush:

Ah yes very nice catch. Since wireguard is implemented in kernel space, it obviously doesn't have a user space process.

So this is actually an entirely different class of problem that we previously overlooked. I'll do some research and see if there are mechanisms available to identify kernel space connections.

cyqsimon commented 5 months ago

While investigating, I also found that container processes also seem to fall under unknown. Yet another category of issues I guess.

flxo commented 5 months ago

While investigating, I also found that container processes also seem to fall under unknown. Yet another category of issues I guess.

Yes. bandwhich can't resolve across network namespaces (on linux).

The root cause of the problem is that the global connection table is used to build up the lookup table which doesn't contain the connections from namespaces. Instead of going that route my guess is that the connection info must be obtained via the inodes of the processes fds. Just a guess.

cyqsimon commented 5 months ago

Yes. bandwhich can't resolve across network namespaces (on linux).

Ah okay so that's the root cause. I was investigating this the other day and was honestly very lost.

My understanding of network namespaces is quite basic so I'm just going to ask. Is data from another namespace accessible from procfs (or other kernel interfaces)? If so, what special hoops do we need to jump through? Alternatively, do you know of any other open-source software that does this already? Maybe I can take a look.

Thanks for your help in advance.

flxo commented 5 months ago

Hi. I played with aggregation code a little bit and there is beside /proc/net/[tcp(6)udp(6)] also /proc/<PID>/net/[tcp(6), udp(6)].

What makes me think is that the key to the map of open sockets is

which is not unique across network namespaces. So Results can never be accurate unless the no namespaces are involved.

CommonLoon102 commented 4 months ago

For me, everything is UNKNOWN. Installed from snap on Ubuntu 23. It would be nice to have nethogs and iftop in one app, like nettop.