Closed okket closed 9 years ago
So, the next time you see this hapening, if i could get the output of ps -eLf | grep ipfs
and also the stack dump from pressing Ctrl + \
ps -eLf | grep ipfs
yields:
ipfs 10438 29363 10438 0 1 05:02 pts/0 00:00:00 ps -eLf
ipfs 10439 29363 10439 0 1 05:02 pts/0 00:00:00 grep --color=auto -i ipfs
ipfs 22072 1 22072 0 1 Apr07 ? 00:00:01 tmux
ipfs 22073 22072 22073 0 1 Apr07 pts/1 00:00:00 -bash
ipfs 22167 22073 22167 0 117 Apr07 pts/1 00:00:00 ipfs daemon
[...]
ipfs 22167 22073 22317 0 117 Apr07 pts/1 00:00:00 ipfs daemon
I can get a core dump of the running process with gcore -o <file> <pid>
. Does this help? Or do you need the dump of the terminated process? The latter is a bit more tricky, the environment options did not allow core dumps when the process was started. The best idea I have is to somehow halt the process (with a signal?) and then gcore
.
A 1-2 minute strace summary:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
72.33 9.250829 96 96790 31186 futex
22.64 2.895659 119 24235 epoll_wait
3.81 0.487598 14 35409 select
0.63 0.080998 3 30416 2570 read
0.43 0.054642 3 16720 write
0.11 0.013771 6886 2 1 restart_syscall
0.05 0.006987 2 3154 sched_yield
------ ----------- ----------- --------- --------- ----------------
100.00 12.790484 206726 33757 total
A short system call trace of one of the threads:
22168 0.000047 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
22168 0.000041 <... restart_syscall resumed> ) = 0
22168 0.000041 select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
22168 0.000026 <... select resumed> ) = 0 (Timeout)
22168 0.000026 select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
22168 0.000106 <... select resumed> ) = 0 (Timeout)
22168 0.000042 select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
22168 0.000051 <... select resumed> ) = 0 (Timeout)
22168 0.000041 futex(0x1083a40, FUTEX_WAIT, 0, {60, 0} <unfinished ...>
22168 0.000047 <... futex resumed> ) = 0
22168 0.000047 select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
22168 0.000057 <... select resumed> ) = 0 (Timeout)
[...]
if you send the process SIGQUIT
it will dump all the goroutine stacks.
But thats a lot of threads, and quite the large number of calls to futex, epoll_wait and select... Looks to me like a goroutine leak
I'll push a changeset that adds an option to disable reuseport. would be good to see the numbers then.
Sounds great. I'll gladly try that changeset and report.
I also managed to get the last 2k lines of the kill -SIGQUIT
output out of the tmux session. The IPFS hash for the file is QmYGvyL3ZUQaQqwk9Zc4uUrpxWG42gh7X86UhRPeeU92U4
Here is an updated graph which shows the daemon restart
@okket changeset available in the branch for #1038 -- it will be merged soon too.
Thank you for merging. I'm running the master branch now with IPFS_REUSEPORT=false ipfs daemon
, but I am not sure the new code gets called. The small typo (log.Infof, see https://github.com/ipfs/go-ipfs/pull/1042) did not error. Consequently I do not see the confirmation message in the logs.
Ok, my reasoning was, hmm, just wrong. Sorry. Since I am not able to catch the log message for some reason I just changed the log_Infof
into a print
in my test setup. The code is called and hopefully the daemon now does not eat up resources over time. I'll report in a day or two.
Still I find the amount of system calls the daemon emits a bit alarming:
$ timeout 60 strace -c -f -p $(pidof ipfs)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
72.08 3.705090 103 36124 3984 futex
19.80 1.017752 38 26701 epoll_wait
6.25 0.321034 18 17715 select
1.03 0.053204 2 25965 2644 read
0.75 0.038429 3 15155 write
0.08 0.004328 3 1562 sched_yield
0.01 0.000512 73 7 madvise
0.00 0.000167 84 2 1 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 5.140516 123231 6629 total
@okket that is pretty concerning. the futex is goroutine scheduling going on, and it implies we probably have too many goroutines. epoll_wait is typically in the network stack, i know that the REUSEPORT code makes a large number of calls to it, go/net probably does as well.
At any rate, i think some profiling needs to be done. and eventlogs need to be turned off.
Preliminary results indicate that current master with IPFS_REUSEPORT=false ipfs daemon
is still generating more context switches over time:
I forgot to mention: Because Fedora does not include P-224 (for patent reasons I guess) I had to remove support from ipfs
:
diff --git a/p2p/crypto/key.go b/p2p/crypto/key.go
index 7f0a13a..31dfeb3 100644
--- a/p2p/crypto/key.go
+++ b/p2p/crypto/key.go
@@ -102,8 +102,8 @@ func GenerateEKeyPair(curveName string) ([]byte, GenSharedKey, error) {
var curve elliptic.Curve
switch curveName {
- case "P-224":
- curve = elliptic.P224()
+// case "P-224":
+// curve = elliptic.P224()
case "P-256":
curve = elliptic.P256()
case "P-384":
diff --git a/p2p/crypto/secio/al.go b/p2p/crypto/secio/al.go
index 9a0cadf..9e0ffce 100644
--- a/p2p/crypto/secio/al.go
+++ b/p2p/crypto/secio/al.go
@@ -18,7 +18,7 @@ import (
)
// List of supported ECDH curves
-var SupportedExchanges = "P-256,P-224,P-384,P-521"
+var SupportedExchanges = "P-256,P-384,P-521"
// List of supported Ciphers
var SupportedCiphers = "AES-256,AES-128,Blowfish"
I hope this change does not in a crazy way I can not possibly imagine create a feedback loop or trigger another thing that is responsible for the growing context switch problem.
I dont see why fedora not having it means you have to remove it from ipfs, p224 is a go library.
Also, i have two fedora boxes running ipfs without modification just fine. You may be running an older version of Go?
Nope:
$ yum info golang
Loaded plugins: langpacks
Installed Packages
Name : golang
Arch : x86_64
Version : 1.4.2
Release : 2.fc21
Size : 10 M
Repo : installed
From repo : updates
Summary : The Go Programming Language
URL : http://golang.org/
License : BSD
Description : The Go Programming Language.
The build fails with P-224 support:
$ make build
cd cmd/ipfs && go build -i
# github.com/ipfs/go-ipfs/p2p/crypto
../../../gocode/src/github.com/ipfs/go-ipfs/p2p/crypto/key.go:106: undefined: elliptic.P224
Makefile:27: recipe for target 'build' failed
make: *** [build] Error 2
can you also run go version
for me? I get the following:
[jeromy@localhost ~]$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description: Fedora release 20 (Heisenbug)
Release: 20
Codename: Heisenbug
[jeromy@localhost ~]$ uname -a
Linux localhost.localdomain 3.17.8-200.fc20.x86_64 #1 SMP Thu Jan 8 23:26:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[jeromy@localhost ~]$ go version
go version go1.4.2 linux/amd64
[jeromy@localhost go-ipfs]$ make build
cd cmd/ipfs && go build -i
[jeromy@localhost go-ipfs]$
$ go version
go version go1.4.2 linux/amd64
Support for P-224 was removed in 2013 (1), Fedora 20 does not ship with it, see the golang-1.2-remove-ECC-p224.patch
here:
http://koji.fedoraproject.org/koji/rpminfo?rpmID=6091693
Are you sure you use the system installed go like me? No $GOROOT
? Also:
$ which go
/bin/go
(1) https://lists.fedoraproject.org/pipermail/golang/2013-December/000596.html
I'll install the official binary go distribution locally to get this side issue out of the way.
Ah, I always install from source. that is really weird that fedora ships a patched go.
It appears that the fedora maintainers are paranoid, bruce schneier quote incoming:
"On the crypto bits in your guardian piece, I found especially interesting that you suggest classic discrete log crypto over ecc. I want to ask if you could elaborate more on that."
I no longer trust the constants. I believe the NSA has manipulated them through their relationships with industry.
https://www.schneier.com/blog/archives/2013/09/the_nsa_is_brea.html#c1675929
@whyrusleeping agreed. this should be a change in Go.
Unchanged master, official go binaries, started with IPFS_REUSEPORT=false ipfs daemon
, still growing:
Maybe a hint: If you look at these graphs and numbers, you see that the timer interrupts (green) are growing linearly (on a logarithmic scale!) up to the maximum of 716, which corresponds to 10k context switches. Nothing else on my server uses this much timers, but the real scary fact (and why I keep on pressing this) is the linear growing. This is not natural, most likely will not stop until server crash and can (IMHO) only be explained by some kind of 'leak'.
I will look into this with debug ideas from http://elinux.org/Kernel_Timer_Systems after my system has finished zfs scrub
(hello ZoL 0.6.4).
okket, could you run ipfs with IPFS_PROF
set to true
? then ctrl+c the daemon when context switches get really high? I want both the cpuprofile and memprofile it outputs
Also, thanks a ton for helping us debug this issue!!
Please ignore my last posting above. The interrupt count on my system is not reliable:
This system BIOS has enabled interrupt remapping on a chipset that contains an erratum
making that feature unstable. To maintain system stability interrupt remapping is being
disabled. Please contact your BIOS vendor for an update
(Sadly there is no BIOS update available)
I doubt that this causes the high context switch problem. The system is very reliable otherwise. I'll insert a complete SIGQUIT dump when it reaches 10k or so.
(also, thanks for the heads up on ZoL 0.6.4)
~12k context switches per second again, growing linearly as usual:
60 second strace:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
79.78 18.854636 67 282022 96950 futex
16.43 3.884003 82 47561 1 epoll_wait
2.86 0.675084 7 91441 40 select
0.43 0.101768 2 50400 4397 read
0.33 0.077317 3 27993 write
0.13 0.029989 1 20670 sched_yield
0.04 0.009583 266 36 24 restart_syscall
0.00 0.000285 1 563 70 rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00 23.632665 520686 101482 total
Now killed with SIGQUIT
, here are the debug files:
ipfs.out QmPh7Srs3W1ChrB9Bwfaz9cfxKDfbk7r2enGHwZ5etKChJ
ipfs.cpuprof QmePD76ngy2cLJYm4FhzDqY5HXo4WY7Voxzb8nTq8xsn7t
ipfs.memprof QmS6Qb94DEEwH3USaJdUn1BoSzYvpf74envHGyEpnyNWt1
is this with master?
On Mon, Apr 13, 2015 at 2:19 AM, Okke Timm notifications@github.com wrote:
~12k context switches per second again, growing linearly as usual:
[image: interrupts-week 3] https://cloud.githubusercontent.com/assets/208337/7112734/20ae5140-e1cd-11e4-956e-49a7d9d12df2.png
60 second strace:
% time seconds usecs/call calls errors syscall
79.78 18.854636 67 282022 96950 futex 16.43 3.884003 82 47561 1 epoll_wait 2.86 0.675084 7 91441 40 select 0.43 0.101768 2 50400 4397 read 0.33 0.077317 3 27993 write 0.13 0.029989 1 20670 sched_yield 0.04 0.009583 266 36 24 restart_syscall 0.00 0.000285 1 563 70 rt_sigreturn
100.00 23.632665 520686 101482 total
ipfs.out QmPh7Srs3W1ChrB9Bwfaz9cfxKDfbk7r2enGHwZ5etKChJ ipfs.cpuprof QmePD76ngy2cLJYm4FhzDqY5HXo4WY7Voxzb8nTq8xsn7t ipfs.memprof QmS6Qb94DEEwH3USaJdUn1BoSzYvpf74envHGyEpnyNWt1
— Reply to this email directly or view it on GitHub https://github.com/ipfs/go-ipfs/issues/1036#issuecomment-92287092.
It was the master from two days ago. Dang, I should have noted the commit id.
Idea: IPFS should hardcode current git commit id into the binary and output it during startup, along with the official version number.
Idea: IPFS should hardcode current git commit id into the binary and output it during startup, along with the official version number.
Agreed. see #691 #693
The build process could do this, but it wouldn't be go-gettable, as modifying the binaries doesn't happen on go get
.
the stack trace looks fairly sane. 60 open socket connections, a few streams. Nothing huge by any means.. this is bizarre
@okket do you still have the binary that you ran for those profiles? pprof really wants to have the same binary they were run with
No, the binary was deleted. I'll keep it for the next round(s).
Here is the link to the data files from the server restart today:
http://gateway.ipfs.io/ipfs/QmUz7ij4KaM7Z5h9A4o7w1oahQUDwT2CiYpXA43HTGQYy7
This time the daemon crashed with "runtime: out of memory"
http://gateway.ipfs.io/ipfs/QmSSvfWum6QDFxX38m8YT5vHTpT9pxgWywZURHPcVf44yY
@okket thats what i get on mars every night, lol. thank you for the logs, i'll let you know if I find anything incriminating on them
@okket So, ive seen this exact trace hundreds of times, but for some reason i decided to look at yours a little more closely. And i found the out of memory issue:
This line in the panic goroutine:
runtime.mallocgc(0xb1b99a33, 0xa8a320, 0xc200000001, 0xc208fa9c20)
Looks like its pretty normal, a function taking some pointers to some things... except that first argument isnt a pointer. Its the allocation size. Looking deeper, it appears that we ask msgio what the next message size is, it tells us this giant number, we say "okay, lets make some memory" and blindly allocate it. I believe we need to set an upper limit on these messages, and/or find out who is sending them.
@jbenet o/ PING
wow, good find!
finally :)
:clap:
thanks @okket @whyrusleeping -- this is great -- :clap: :clap:
When I let
ipfs daemon
run for about 48 hours my system starts to show latency issues due to a huge amount of context switches. They slowly grow over time and are directly attributable to ipfs (the problem does not exist without running the daemon).Here is a nice graph to illustrate:
Every other metric, like packets per second (about 300), connections (about 50), memory usage, etc. does not grow out of bounds.
Guesswork: With
vmstat 1
I see nearly exactly ever 10 seconds a huge spike in context switches, currently about 20k. I am somewhat sure that this is daemon.I'll update this issue tomorrow with fresh numbers and hopefully a dump of the daemon.