moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.44k stars 18.61k forks source link

constantly crashing node #28126

Closed djalal closed 4 years ago

djalal commented 7 years ago

Description

for the last 3 weeks, one staging VM has been constantly crashing, at least once a day.

the big difference with other nodes is they have XFS while this one uses EXTFS as backing filesystem (all on AUFS, not overlay)

the other difference is we began launching containers with multiple processes inside, despite the best practices.

according to you, what would be a durable solution :

here is a screenshot

kernelpanic

Steps to reproduce the issue:

  1. start the crashed node
  2. run docker start $(docker ps -q -f status=exited

Describe the results you received:

crashing node

Describe the results you expected:

stable running node

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

docker version
Client:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:        Tue Oct 11 17:43:41 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:        Tue Oct 11 17:43:41 2016
 OS/Arch:      linux/amd64

Output of docker info:

$ docker info
Containers: 27
 Running: 26
 Paused: 0
 Stopped: 1
Images: 226
Server Version: 1.12.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 771
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge overlay null
Swarm: active
 NodeID: 9mtwfi2wnb81ie4e6j2fyukhv
 Is Manager: false
 Node Address: 192.168.200.144
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.818 GiB
Name: qa
ID: UHVE:FOMU:EHFV:5B7C:K4OV:3FLQ:IU3A:AAED:JXII:EGYN:WOLH:SKOJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

output from journalctl -u dockerd

Nov  7 10:10:41 qanode kernel: [75791.626324] aufs au_opts_verify:1570:dockerd[771]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:41 qanode kernel: [75792.130930] aufs au_opts_verify:1570:dockerd[1978]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:41 qanode kernel: [75792.486537] aufs au_opts_verify:1570:dockerd[1557]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:42 qanode kernel: [75793.091875] aufs au_opts_verify:1570:dockerd[1539]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:42 qanode kernel: [75793.472475] aufs au_opts_verify:1570:dockerd[762]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:43 qanode kernel: [75793.871651] aufs au_opts_verify:1570:dockerd[3091]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:44 qanode kernel: [75794.677060] aufs au_opts_verify:1570:dockerd[3151]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:44 qanode kernel: [75795.484553] aufs au_opts_verify:1570:dockerd[3199]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:45 qanode kernel: [75796.236633] aufs au_opts_verify:1570:dockerd[3389]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:46 qanode kernel: [75797.414239] aufs au_opts_verify:1570:dockerd[3378]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:47 qanode kernel: [75798.516585] aufs au_opts_verify:1570:dockerd[3497]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:48 qanode kernel: [75799.442327] aufs au_opts_verify:1570:dockerd[3556]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:49 qanode kernel: [75799.878950] aufs au_opts_verify:1570:dockerd[3270]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:51 qanode kernel: [75802.043059] aufs au_opts_verify:1570:dockerd[3542]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:10:57 qanode dockerd[568]: time="2016-11-07T10:10:57+01:00" level=info msg="Firewalld running: false"
Nov  7 10:10:59 qanode kernel: [75810.243248] aufs au_opts_verify:1570:dockerd[3606]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:00 qanode dockerd[568]: time="2016-11-07T10:11:00+01:00" level=info msg="Firewalld running: false"
Nov  7 10:11:01 qanode kernel: [75812.260621] aufs au_opts_verify:1570:dockerd[3437]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:02 qanode dockerd[568]: time="2016-11-07T10:11:02+01:00" level=info msg="Firewalld running: false"
Nov  7 10:11:02 qanode kernel: [75813.361075] aufs au_opts_verify:1570:dockerd[3497]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:03 qanode dockerd[568]: time="2016-11-07T10:11:03.354652904+01:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/ff88f0717c09ec21289e2e72b08ac5505ad4fc1109127e316440a5866f44562b/shm: invalid argument"
Nov  7 10:11:03 qanode kernel: [75813.957069] aufs au_opts_verify:1570:dockerd[3999]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:03 qanode kernel: [75814.500726] aufs au_opts_verify:1570:dockerd[3497]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:04 qanode kernel: [75815.043920] aufs au_opts_verify:1570:dockerd[3938]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:27 qanode kernel: [75838.116789] aufs au_opts_verify:1570:dockerd[3780]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:28 qanode kernel: [75838.549526] aufs au_opts_verify:1570:dockerd[3606]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:31 qanode kernel: [75842.327419] aufs au_opts_verify:1570:dockerd[3780]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:11:37 qanode kernel: [75847.623032] aufs au_opts_verify:1570:dockerd[1734]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov  7 10:25:50 qanode dockerd[568]: time="2016-11-07T10:25:50.469599164+01:00" level=warning msg="containerd: unable to save 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:50ada5f8bd937a19f1b436800fb9c01d57a5790fa639f36be776cfe4bca1cb5c starttime: open /proc/7391/stat: no such file or directory"
Nov  7 10:25:50 qanode dockerd[568]: time="2016-11-07T10:25:50.470893504+01:00" level=info msg="containerd: 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:50ada5f8bd937a19f1b436800fb9c01d57a5790fa639f36be776cfe4bca1cb5c (pid 7391) has become an orphan, killing it"
Nov  7 10:41:57 qanode dockerd[568]: time="2016-11-07T10:41:57.965474706+01:00" level=warning msg="containerd: unable to save 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:080c6b4ae35fd8540a5a6ea153cdd7472b3be821516e5741e47a8d428a938e2d starttime: open /proc/10060/stat: no such file or directory"
Nov  7 10:41:57 qanode dockerd[568]: time="2016-11-07T10:41:57.965978608+01:00" level=info msg="containerd: 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:080c6b4ae35fd8540a5a6ea153cdd7472b3be821516e5741e47a8d428a938e2d (pid 10060) has become an orphan, killing it"
Nov  7 11:46:48 qanode dockerd[568]: time="2016-11-07T11:46:48.325625413+01:00" level=warning msg="containerd: unable to save 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:ae1e74544103715cffd03f30d2d8711bedf1cd567417ce0c273870fe65f94865 starttime: open /proc/21222/stat: no such file or directory"
Nov  7 11:46:48 qanode dockerd[568]: time="2016-11-07T11:46:48.327918375+01:00" level=info msg="containerd: 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:ae1e74544103715cffd03f30d2d8711bedf1cd567417ce0c273870fe65f94865 (pid 21222) has become an orphan, killing it"
Nov  7 11:48:49 qanode dockerd[568]: time="2016-11-07T11:48:49.877388528+01:00" level=warning msg="containerd: unable to save 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:f3c603c9cc892c261796d91e476c1f4a5722f8eb1300597d5dac7248fd94cb4f starttime: open /proc/21526/stat: no such file or directory"
Nov  7 11:48:49 qanode dockerd[568]: time="2016-11-07T11:48:49.877923809+01:00" level=info msg="containerd: 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211:f3c603c9cc892c261796d91e476c1f4a5722f8eb1300597d5dac7248fd94cb4f (pid 21526) has become an orphan, killing it"

output from the container:

docker inspect 92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211
[
    {
        "Id": "92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211",
        "Created": "2016-10-28T11:48:03.505218862Z",
        "Path": "/nx-order-sg",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 2983,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2016-11-07T12:19:50.347945604Z",
            "FinishedAt": "2016-11-07T11:54:42.070290474Z",
            "Health": {
                "Status": "unhealthy",
                "FailingStreak": 30,
                "Log": [
                    {
                        "Start": "2016-11-07T13:45:54.866699033+01:00",
                        "End": "2016-11-07T13:45:55.232457099+01:00",
                        "ExitCode": 1,
                        "Output": "wget: can't open '_health': File exists\n"
                    },
                    {
                        "Start": "2016-11-07T13:46:55.232625441+01:00",
                        "End": "2016-11-07T13:46:55.724260746+01:00",
                        "ExitCode": 1,
                        "Output": "wget: can't open '_health': File exists\n"
                    },
                    {
                        "Start": "2016-11-07T13:47:55.725319789+01:00",
                        "End": "2016-11-07T13:47:56.375902349+01:00",
                        "ExitCode": 1,
                        "Output": "wget: can't open '_health': File exists\n"
                    },
                    {
                        "Start": "2016-11-07T13:48:56.378547757+01:00",
                        "End": "2016-11-07T13:48:57.031854603+01:00",
                        "ExitCode": 1,
                        "Output": "wget: can't open '_health': File exists\n"
                    },
                    {
                        "Start": "2016-11-07T13:49:57.032003857+01:00",
                        "End": "2016-11-07T13:49:57.600138324+01:00",
                        "ExitCode": 1,
                        "Output": "wget: can't open '_health': File exists\n"
                    }
                ]
            }
        },
        "Image": "sha256:c302071123bd8a4d0a2ac056551daf03a2622da937973c9cb905d4b0c4585116",
        "ResolvConfPath": "/var/lib/docker/containers/92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211/hostname",
        "HostsPath": "/var/lib/docker/containers/92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211/hosts",
        "LogPath": "/var/lib/docker/containers/92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211/92e1c8bce55bf407c3bcede14a47d64464e72836d96747654733dfd97c036211-json.log",
        "Name": "/nx-order-sg-initial-commit",
        "RestartCount": 0,
        "Driver": "aufs",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "aufs",
            "Data": null
        },
        "Mounts": [],
        "Config": {
            "Hostname": "92e1c8bce55b",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "8080/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "DB_HOST=192.168.200.33",
                "DB_NAME=sgv3",
                "DB_USERNAME=***",
                "DB_PASSWORD=*******",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/nx-order-sg"
            ],
            "Healthcheck": {
                "Test": [
                    "CMD-SHELL",
                    "wget -q http://localhost:8080/_health || exit 1"
                ],
                "Interval": 60000000000,
                "Timeout": 3000000000
            },
            "Image": "registry.qanode.local/nx-order-sg:initial-commit",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {}
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "06c9dc8e1dfac3ad9167665427b9b7d26fd3d89d3c11f144a5abeff450934e59",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "8080/tcp": null
            },
            "SandboxKey": "/var/run/docker/netns/06c9dc8e1dfa",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "c168f68b9933a898b98f9de1599b6154c91b0a96bd084125f5f938eba994ca8e",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.17",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:11",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "d3ea3edf3999d3d8fb86a300164f4667f34fba1321a2c95bf40d1552403f944e",
                    "EndpointID": "c168f68b9933a898b98f9de1599b6154c91b0a96bd084125f5f938eba994ca8e",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.17",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:11"
                }
            }
        }
    }
]
cpuguy83 commented 7 years ago

The dirperm1 message is a red herring. It's a warning from aufs kernel module that is not related to the kernel panic.

Can you paste the full kernel panic?

thaJeztah commented 7 years ago

Can you paste the full kernel panic?

ping @djalal could you provide some more info ^^ ?

djalal commented 7 years ago

We couldn't capture the full kernel panic, but we isolated our crash to a specific container.

When running it on a kernel 4.x with xfs, this specific container does not crash the node anymore. And, as you guessed, the fragile node is now running stable, since we moved the suspect container to a new node.

As stated before, we would welcome any feedback/advice on the most stable combination of host distro + kernel version + backing filesystem

thaJeztah commented 7 years ago

As stated before, we would welcome any feedback/advice on the most stable combination of host distro + kernel version + backing filesystem

A kernel panic is a bug in the kernel, and should never happen; all of the combinations mentioned should work, so if you get kernel panics, I'd recommend reporting this with debian (also make sure your kernel is fully up-to-date with the kernel provided by them).

the other difference is we began launching containers with multiple processes inside, despite the best practices.

What is used to run these processes? Are you using systemd, and is the container run "privileged"?

amyangfei commented 7 years ago

We have similar kernel panic and system crash. We are running docker on debian8 with latest kernel Debian 3.16.36-1+deb8u1

Output of docker version:

Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:39:14 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:39:14 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 115
 Running: 21
 Paused: 0
 Stopped: 94
Images: 2005
Server Version: 1.12.3
Storage Driver: aufs
 Root Dir: /home/projects/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1879
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.93 GiB
Name: psigor-node226-igor.i.nease.net
ID: PEBC:D4DO:LEZ7:E5OI:AFQ5:NE4N:M5DZ:IBFG:IDJU:N7Z4:2HBH:CMAU
Docker Root Dir: /home/projects/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

kernel panic log 1:

November 17th 2016, 17:12:11.384    kernel: [521520.480587] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
November 17th 2016, 17:12:11.384    kernel: [521520.480800] IP: [<ffffffff810a10d0>] check_preempt_wakeup+0xd0/0x1d0
November 17th 2016, 17:12:11.384    kernel: [521520.480943] PGD f5e8ed067 PUD f5e8f4067 PMD 0 
November 17th 2016, 17:12:11.384    kernel: [521520.481192] Oops: 0000 [#1] SMP 
November 17th 2016, 17:12:11.384    kernel: [521520.481383] Modules linked in: xt_nat veth ipt_MASQUERADE iptable_nat nf_nat_ipv4 xt_addrtype nf_nat bridge stp llc aufs(C) msr cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_stats ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables x86_pkg_temp_thermal intel_powerclamp intel_rapl coretemp kvm_intel kvm crc32_pclmul aesni_intel ttm aes_x86_64 lrw evdev drm_kms_helper gf128mul ipmi_watchdog iTCO_wdt iTCO_vendor_support hpwdt glue_helper lpc_ich hpilo drm i2c_algo_bit ablk_helper cryptd pcspkr i2c_i801 serio_raw i2c_core mfd_core processor ioatdma shpchp dca thermal_sys wmi tpm_tis tpm acpi_power_meter button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sg crct10dif_pclmul crct10dif_common psmouse uhci_hcd ehci_pci xhci_hcd ehci_hcd hpsa bnx2x usbcore mdio libcrc32c usb_common crc32c_generic scsi_mod crc32c_intel
November 17th 2016, 17:12:11.384    kernel: [521520.487205] CPU: 11 PID: 62050 Comm: uwsgi Tainted: G         C    3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2
November 17th 2016, 17:12:11.384    kernel: [521520.487300] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 05/06/2015
November 17th 2016, 17:12:11.384    kernel: [521520.487378] task: ffff880f5e8ef530 ti: ffff880f5e8f0000 task.ti: ffff880f5e8f0000
November 17th 2016, 17:12:11.384    kernel: [521520.487467] RIP: 0010:[<ffffffff810a10d0>]  [<ffffffff810a10d0>] check_preempt_wakeup+0xd0/0x1d0
November 17th 2016, 17:12:11.384    kernel: [521520.487619] RSP: 0018:ffff880f5e8f3a48  EFLAGS: 00010006
November 17th 2016, 17:12:11.384    kernel: [521520.487696] RAX: 0000000000000000 RBX: ffff880832cc6040 RCX: 0000000000000008
November 17th 2016, 17:12:11.384    kernel: [521520.487785] RDX: 0000000000000000 RSI: ffff880fb97c2050 RDI: ffff88087fdd2fb8
November 17th 2016, 17:12:11.384    kernel: [521520.487875] RBP: 0000000000000000 R08: ffffffff816108c0 R09: 0000000000000001
November 17th 2016, 17:12:11.384    kernel: [521520.487965] R10: 0000000000001260 R11: 0000000000001261 R12: ffff880f5e8ef530
November 17th 2016, 17:12:11.384    kernel: [521520.488054] R13: ffff88087fdd2f40 R14: 0000000000000000 R15: 0000000000000000
November 17th 2016, 17:12:11.384    kernel: [521520.488144] FS:  00007f016e506700(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
November 17th 2016, 17:12:11.384    kernel: [521520.488234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
November 17th 2016, 17:12:11.384    kernel: [521520.488312] CR2: 0000000000000078 CR3: 0000000f5e8ec000 CR4: 00000000001407e0
November 17th 2016, 17:12:11.384    kernel: [521520.488403] Stack:
November 17th 2016, 17:12:11.384    kernel: [521520.488475]  ffff880fb97c2050 ffff88087fdd2f40 ffff88107fcd3858 ffff880f5e8f3af0
November 17th 2016, 17:12:11.384    kernel: [521520.488796]  ffff880fb97c2050 ffff88107fcd2f40 0000000000000059 ffffffff81095b75
November 17th 2016, 17:12:11.384    kernel: [521520.489116]  ffff880fb97c20e0 ffffffff810a3b15 ffff88107fcd2f40 0000000000000087
November 17th 2016, 17:12:11.384    kernel: [521520.489437] Call Trace:
November 17th 2016, 17:12:11.384    kernel: [521520.489512]  [<ffffffff81095b75>] ? check_preempt_curr+0x85/0xa0
November 17th 2016, 17:12:11.384    kernel: [521520.489591]  [<ffffffff810a3b15>] ? load_balance+0x455/0x850
November 17th 2016, 17:12:11.384    kernel: [521520.489669]  [<ffffffff810a42bf>] ? pick_next_task_fair+0x3af/0x820
November 17th 2016, 17:12:11.384    kernel: [521520.489749]  [<ffffffff815147a6>] ? __schedule+0x106/0x6f0
November 17th 2016, 17:12:11.384    kernel: [521520.489827]  [<ffffffff81514192>] ? schedule_timeout+0x162/0x2d0
November 17th 2016, 17:12:11.384    kernel: [521520.489908]  [<ffffffff81073e20>] ? ftrace_raw_event_tick_stop+0xb0/0xb0
November 17th 2016, 17:12:11.384    kernel: [521520.489988]  [<ffffffff8140e749>] ? sk_wait_data+0xc9/0xd0
November 17th 2016, 17:12:11.384    kernel: [521520.490065]  [<ffffffff810a9590>] ? prepare_to_wait_event+0xf0/0xf0
November 17th 2016, 17:12:11.384    kernel: [521520.490147]  [<ffffffff81469f0d>] ? tcp_recvmsg+0x7dd/0xc30
November 17th 2016, 17:12:11.384    kernel: [521520.490224]  [<ffffffff810a959e>] ? autoremove_wake_function+0xe/0x30
November 17th 2016, 17:12:11.384    kernel: [521520.490304]  [<ffffffff8149215a>] ? inet_recvmsg+0x6a/0x80
November 17th 2016, 17:12:11.384    kernel: [521520.490383]  [<ffffffff81408a2e>] ? sock_aio_read.part.7+0xfe/0x120
November 17th 2016, 17:12:11.384    kernel: [521520.490463]  [<ffffffff811a9edc>] ? do_sync_read+0x5c/0x90
November 17th 2016, 17:12:11.384    kernel: [521520.490540]  [<ffffffff811aa785>] ? vfs_read+0x135/0x170
November 17th 2016, 17:12:11.384    kernel: [521520.490616]  [<ffffffff811ab312>] ? SyS_read+0x42/0xa0
November 17th 2016, 17:12:11.384    kernel: [521520.490694]  [<ffffffff8151858d>] ? system_call_fast_compare_end+0x10/0x15
November 17th 2016, 17:12:11.384    kernel: [521520.490772] Code: 39 c2 7d 27 0f 1f 80 00 00 00 00 83 e8 01 48 8b 5b 70 39 d0 75 f5 48 8b 7d 78 48 3b 7b 78 74 15 0f 1f 00 48 8b 6d 70 48 8b 5b 70 <48> 8b 7d 78 48 3b 7b 78 75 ee 48 85 ff 74 e9 e8 8c cb ff ff 48 
November 17th 2016, 17:12:11.384    kernel: [521520.495007] RIP  [<ffffffff810a10d0>] check_preempt_wakeup+0xd0/0x1d0
November 17th 2016, 17:12:11.384    kernel: [521520.495154]  RSP <ffff880f5e8f3a48>
November 17th 2016, 17:12:11.384    kernel: [521520.495234] CR2: 0000000000000078
November 17th 2016, 17:12:11.384    kernel: [521520.495314] ---[ end trace 1a0dba1acf43f729 ]---
November 17th 2016, 17:12:11.384    kernel: [521529.277038] TCP: TCP: Possible SYN flooding on port 6000. Sending cookies.  Check SNMP counters.
November 17th 2016, 17:12:11.506    kernel: [521529.399155] ------------[ cut here ]------------

kernel panic log 2:

November 28th 2016, 10:15:04.968    kernel: [1099012.835356] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
November 28th 2016, 10:15:04.968    kernel: [1099012.835592] IP: [<ffffffff810a45c8>] pick_next_task_fair+0x6b8/0x820
November 28th 2016, 10:15:04.968    kernel: [1099012.835753] PGD 0 
November 28th 2016, 10:15:04.968    kernel: [1099012.835897] Oops: 0000 [#1] SMP 
November 28th 2016, 10:15:04.968    kernel: [1099012.836103] Modules linked in: xfrm_user xfrm_algo tcp_diag inet_diag xt_nat veth msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 xt_addrtype nf_nat bridge stp llc aufs(C) cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative ipt_REJECT xt_tcpudp xt_conntrack iptable_filter ip_tables x_tables x86_pkg_temp_thermal intel_powerclamp intel_rapl coretemp kvm_intel joydev kvm nf_conntrack_ipv4 hid_generic usbhid nf_defrag_ipv4 crc32_pclmul hid nf_conntrack ttm drm_kms_helper drm i2c_algo_bit hpilo hpwdt aesni_intel evdev iTCO_wdt iTCO_vendor_support aes_x86_64 lrw gf128mul glue_helper ipmi_watchdog ablk_helper processor cryptd thermal_sys i2c_i801 i2c_core serio_raw pcspkr acpi_power_meter shpchp tpm_tis tpm wmi lpc_ich mfd_core button ioatdma dca ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sg crct10dif_pclmul crct10dif_common psmouse uhci_hcd ehci_pci xhci_hcd ehci_hcd bnx2x hpsa mdio libcrc32c usbcore crc32c_generic usb_common scsi_mod crc32c_intel
November 28th 2016, 10:15:04.968    kernel: [1099012.842910] CPU: 10 PID: 60 Comm: ksoftirqd/10 Tainted: G         C    3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u1
November 28th 2016, 10:15:04.968    kernel: [1099012.843021] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
November 28th 2016, 10:15:04.968    kernel: [1099012.843110] task: ffff880857f56050 ti: ffff880857f68000 task.ti: ffff880857f68000
November 28th 2016, 10:15:04.968    kernel: [1099012.843214] RIP: 0010:[<ffffffff810a45c8>]  [<ffffffff810a45c8>] pick_next_task_fair+0x6b8/0x820
November 28th 2016, 10:15:04.968    kernel: [1099012.843386] RSP: 0018:ffff880857f6bde0  EFLAGS: 00010046
November 28th 2016, 10:15:04.968    kernel: [1099012.843472] RAX: 0000000000000000 RBX: ffff88082cd501c0 RCX: 0000000000000000
November 28th 2016, 10:15:04.968    kernel: [1099012.843575] RDX: 0000000000000001 RSI: ffff8807e23bf828 RDI: ffff880837e44d18
November 28th 2016, 10:15:04.968    kernel: [1099012.843679] RBP: ffff8807e23bf800 R08: 0000000000000000 R09: 000000000000badf
November 28th 2016, 10:15:04.968    kernel: [1099012.843781] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000000
November 28th 2016, 10:15:04.968    kernel: [1099012.843884] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88087fd92f40
November 28th 2016, 10:15:04.968    kernel: [1099012.843988] FS:  0000000000000000(0000) GS:ffff88087fd80000(0000) knlGS:0000000000000000
November 28th 2016, 10:15:04.968    kernel: [1099012.844092] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
November 28th 2016, 10:15:04.968    kernel: [1099012.844179] CR2: 0000000000000078 CR3: 0000000001813000 CR4: 00000000001407e0
November 28th 2016, 10:15:04.968    kernel: [1099012.844282] Stack:
November 28th 2016, 10:15:04.968    kernel: [1099012.844362]  ffff880837e44ca0 00000001810a0934 ffff880857f56050 0000000000012f40
November 28th 2016, 10:15:04.968    kernel: [1099012.844712]  ffff88087fd92fb8 ffffffff8101ca45 ffff880857f564a8 ffff880857f56050
November 28th 2016, 10:15:04.968    kernel: [1099012.845063]  ffff88087fd92f40 000000000000000a 0000000000000000 0000000000000000
November 28th 2016, 10:15:04.968    kernel: [1099012.845412] Call Trace:
November 28th 2016, 10:15:04.968    kernel: [1099012.845502]  [<ffffffff8101ca45>] ? sched_clock+0x5/0x10
November 28th 2016, 10:15:04.968    kernel: [1099012.845593]  [<ffffffff81514786>] ? __schedule+0x106/0x6f0
November 28th 2016, 10:15:04.968    kernel: [1099012.845683]  [<ffffffff8108fff6>] ? smpboot_thread_fn+0xc6/0x190
November 28th 2016, 10:15:04.968    kernel: [1099012.845772]  [<ffffffff8108ff30>] ? SyS_setgroups+0x170/0x170
November 28th 2016, 10:15:04.968    kernel: [1099012.845863]  [<ffffffff810894bd>] ? kthread+0xbd/0xe0
November 28th 2016, 10:15:04.968    kernel: [1099012.845951]  [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180
November 28th 2016, 10:15:04.968    kernel: [1099012.846043]  [<ffffffff81518498>] ? ret_from_fork+0x58/0x90
November 28th 2016, 10:15:04.968    kernel: [1099012.846130]  [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180
November 28th 2016, 10:15:04.968    kernel: [1099012.846217] Code: 49 8b 7c 24 78 48 39 fd 74 2f 44 8b 73 68 45 8b 6c 24 68 45 39 ee 0f 8e c7 00 00 00 48 89 ef 48 89 de e8 ac 91 ff ff 48 8b 5b 70 <49> 8b 7c 24 78 48 8b 6b 78 48 39 fd 75 d1 48 85 ed 74 cc 4c 89 
November 28th 2016, 10:15:04.968    kernel: [1099012.850364] RIP  [<ffffffff810a45c8>] pick_next_task_fair+0x6b8/0x820
November 28th 2016, 10:15:04.968    kernel: [1099012.850517]  RSP <ffff880857f6bde0>
November 28th 2016, 10:15:04.968    kernel: [1099012.850600] CR2: 0000000000000078
November 28th 2016, 10:15:04.968    kernel: [1099012.850682] ---[ end trace b35189696cf422e1 ]---
thaJeztah commented 4 years ago

closing as this one went stale, and looks like a kernel issue