Open Procsiab opened 3 weeks ago
Can you run tailscale debug --mem-profile=tailscale.mem.pprof
and attach that file here? You might have to zip it or tar.gz it to let github accept it.
After testing some permutations of the software and system versions yesterday, I left installed the tailscale client 1.48.1
, and with it running I gathered the requested information (since it is a version which triggers the issue I am reporting).
I left one ARM64 system with the deployment 39.20240403.0
running overnight, and the I captured the debug info with the command @bradfitz provided; this first file is called tailscale_kernel6711.mem.pprof
Afterwards on the same system, I deployed the version 39.20240407.0
(you can find the packages diff among the two in my first post) and then let it run for four hours during which the memory slowly filled up; finally, I captured again the debug info with the same command. The second file is called tailscale_kernel684.mem.pprof
I provide a ZIP archive containing the two files. tailscale.mem.pprof.zip
Additional info
On the affected system, I started the client in the following way:
tailscale up --accept-dns=true --login-server=https://myheadscale.org
And the contents of /etc/defaults/tailscaled
are the following:
PORT="41641"
FLAGS=""
TS_NO_LOGS_NO_SUPPORT=true
I only see ~45 MB of memory in those pprof files, not hundreds.
% go tool pprof tailscale_kernel6711.mem.pprof
File: tailscaled
Type: inuse_space
Time: Apr 27, 2024 at 1:42am (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 46969.28kB, 97.87% of 47993.68kB total
Showing top 10 nodes out of 83
flat flat% sum% cum cum%
32679.99kB 68.09% 68.09% 32679.99kB 68.09% github.com/tailscale/wireguard-go/device.(*Device).PopulatePools.func3
6536kB 13.62% 81.71% 6536kB 13.62% tailscale.com/net/tstun.wrap
2064.04kB 4.30% 86.01% 2064.04kB 4.30% github.com/tailscale/wireguard-go/tun.newTCPGROTable
1401.24kB 2.92% 88.93% 1401.24kB 2.92% github.com/klauspost/compress/zstd.encoderOptions.encoder
1184.27kB 2.47% 91.40% 1184.27kB 2.47% github.com/klauspost/compress/zstd.(*fastBase).ensureHist
1024.05kB 2.13% 93.53% 1024.05kB 2.13% tailscale.com/wgengine/magicsock.(*endpoint).handlePongConnLocked
540.51kB 1.13% 94.66% 540.51kB 1.13% github.com/tailscale/wireguard-go/device.newHandshakeQueue
513.50kB 1.07% 95.73% 513.50kB 1.07% bytes.growSlice
513kB 1.07% 96.80% 513kB 1.07% vendor/golang.org/x/net/http2/hpack.newInternalNode
512.69kB 1.07% 97.87% 512.69kB 1.07% encoding/pem.Decode
(pprof) %
Sure it's the Tailscale process?
I am sure that from visually inspecting the memory usage with top
nothing stands out, neither the Tailscale client; however I am also sure that if I turn off the Tailscale client's systemd unit file the memory stops filling, and if I reboot in this state (with the client disabled and not starting at boot) the memory never fills up again.
Moreover, I have tested this scenario on other virtual and physical machines all of them running the same deployment of Fedora IoT and Tailscale version, and it seems only reproducible on AARCH64 with kernel 6.8
I am sure that from visually inspecting the memory usage with
top
nothing stands out, neither the Tailscale client; however I am also sure that if I turn off the Tailscale client's systemd unit file the memory stops filling, and if I reboot in this state (with the client disabled and not starting at boot) the memory never fills up again. Moreover, I have tested this scenario on other virtual and physical machines all of them running the same deployment of Fedora IoT and Tailscale version, and it seems only reproducible on AARCH64 with kernel 6.8
Hello, it's not Tailscale's fault in my case, see https://discussion.fedoraproject.org/t/high-memory-usage-in-f40-on-rpi-4-unable-to-find-which-process-used-them/114598/ You are probably not the only one who faces this issue.
Thanks @wolf-yuan-6115 for your reply: I did not notice the discussion you started on the fedora forum because I had been looking on the internet a couple of weeks ago and I was convinced that I was not interested since in the title and first message the Fedora 40 upgrade was discussed, while I am able to reproduce my issue also with Fedora 39 (the IoT version).
For the time being, I'll join the discussion over to the Fedora forum and close this one, while we figure out if there is anything to do with the kernel 6.8 itself on ARM64.
What is the issue?
Description
Leaving the system in an "idle" state results in the exhaustion of 100MB RAM every hour, until the system becomes unresponsive and it reboots. Monitoring the situation with top does not show a process that stands out for memory usage, while the free memory still gets lower over time.
System info
I disabled every other service I added to the "vanilla" deployment of Fedora IoT and I am able to reproduce this issue onl;y when Tailscale is running, on two different Raspberry Pies. NOTE that I am not able to reproduce the issue anymore if I roll back to the deployment
39.20240403.0
which bundles the kernel 6.7.11; every deployment version committed later includes the kernel 6.8, hence my assumption that this can be related.For reference, I post here the diff between the Fedora IoT latest working deployment, and the first one that shows the issue:
Steps to reproduce
39.20240407.0
up to the latest40.20240425.0
).1.44.1
and1.64.0
NOTE: I am not able to reproduce the issue anymore if I stop the Tailscale SystemD service; however, stopping the client does not free up the allocated RAM, that needs a reboot.
Are there any recent changes that introduced the issue?
Deploy a Fedora IoT version for AARCH64 with kernel 6.8 included: on x86_64 the issue is not reproducible (tested on both virtual and physical machines), with any of the Fedora IoT deployments on which it happens on aarch64 instead.
OS
Linux
OS version
Fedora IoT 39 and 40
Tailscale version
1.44.1 to 1.64.0
Other software
I am using a Headscale
0.22.3
server to connect the clientsBug report
No response