Closed ligi closed 5 years ago
Do you have few lines before the start of that goroutine dump?
unfortunately not - copied the whole tmux buffer where this was running. Or is IPFS dumping it also somewhere apart from the console?
No. I'm not sure if that's even possible with go (without redirecting all of stderr). I usually handle this by running go-ipfs with systemd so everything is logged to the journal.
Unfortunately, there isn't really enough here to debug this. The only thing I'm noticing is a bunch of lock contention closing connections. My guess is that this is an OOM, likely caused by #6237 leading to a mass buildup of connections.
If you see this again on go-ipfs 0.4.21, we can reopen and dig in more.
(although it would be helpful to know how many cores and how much memory you have)
16gb of ram and this is the cpuinfo:
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 4999.70
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5000.00
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 4999.70
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 1
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5000.00
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 4999.70
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 2
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5000.00
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 4999.70
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm kaiser tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 5000.00
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
Got it. So a nice, beefy machine. Do you have any idea how many peers you had when it crashed?
the machine is vegetarian but powerful SCNR ;-) unfortunately my time-machine is borken so I cannot went back in time to see how many peers it had while crashing. Or do you have an idea how to find this out after the crash?
We don't record it anywhere. I was just hoping that you might have checked by some slim chance.
I also encountered this situation. my ipfs version is 0.5.0. and this happened when I have about 250 peers
@JerryLX please create a new issue with the full backtrace. All the sync.runtime_Semacquire
lines are unrelated to the actual crash (those are just goroutines waiting on locks).
goroutine 1273 [semacquire, 95 minutes]:
sync.runtime_SemacquireMutex(0x56509b?, 0x0?, 0x0?)
runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc000092320)
sync/mutex.go:162 +0x165
sync.(*Mutex).Lock(...)
sync/mutex.go:81
math/rand.(*lockedSource).seedPos(0xc000092320, 0xed9caba6c?, 0xc000094178)
math/rand/rand.go:409 +0x45
math/rand.(*Rand).Seed(0x4d6?, 0xc004e5c950?)
math/rand/rand.go:75 +0x33
math/rand.Seed(...)
math/rand/rand.go:303
My problem was occur in rand function. I don't know why. I was never seen it for 3 months so I can't reproduce it.
That's not the crash (that goroutine is stuck) but it looks like something crashed while holding the rand lock. But if you can't reproduce there's likely nothing we can do. It's either a bug in the runtime, or some kind of memory corruption.
this happened with 0.4.20 - so feel free to close if fixed with 0.4.21. I just did not find any matching report and did not want this crashlog to get lost. Perhaps it can be helpful - if not just close please.