Closed arraylabs closed 7 years ago
HASS 0.45 Docker started to be based on Python 3.6.1.
Please list the components and platforms that you use.
I was having the same issue with the standard docker container, switched over the hass.io container (running python 3.5.2) and haven't had any issues. My components list:
homeassistant
http
notify
joaoapps_join
frontend
discovery
logbook
config
logger
sensor
binary_sensor
camera
remote
sun
conversation
device_tracker
updater
recorder
history
zeroconf
ecobee
influxdb
mqtt
light
switch
media_player (plex,kodi)
input_select
input_boolean
group
automation
zone
zwave
zha
script
alarm_control_panel
nest
tts
twilio
ring
Figured I couldn't be the only one, confirms I'm not losing my mind :)
Components:
homeassistant
http
frontend
updater
ios
logger
history
logbook
sun
automation
script
group
scene
shell_command
recorder
sensor (template, rest, zoneminder,darksky)
switch (zoneminder, template)
remote (harmony x2)
device_tracker (unifi)
input_boolean
cover (myq)
zwave
zha
emulated_hue
input_select
zone
notify (smtp)
zoneminder
Only component added recently (but added at 0.44.x) is the ZHA component, removing the hue hub in the process.
Thanks for any help/information!
So I wonder which of the components is causing this. As I have been running Home Assistant under the Python 3.6 docker image just fine .
Could you experiment by turning the following components off one by one to see if stops getting segfaults?
shell_command recorder remote (harmony x2) device_tracker (unifi) zwave zha
Just chiming in to say I'm also having Docker crashing issues on Synology NAS with Docker. I have my Docker set to automatically restart, so by the time I notice HASS has ticked over, logs are gone.
Going to disable the auto reboot for now, and upgrade to 0,45.1 just for good measure. Unfortunately the reboots for me at least are random, could take over 24 hours for the crash to occur.
My component list if it helps
As @balloob suggests I'm going to disable
I need Z-wave (as all my sensors run off that), so if crashes happen again, then we can eliminate those.
Had another crash with those three components disabled.
From some Googling, I think this is a known issue with the 3.6 Docker image. See docker-library/python#190 and docker-library/python#160
If hass.io is still using 3.5, are there any new components/features that require 3.6? If not, perhaps we should consider downgrading back to 3.5 for the time being?
@balloob I will try turning off as many of the components as I can tomorrow, will have to be tomorrow so I can manage my wife's unhappiness with stuff not working. :)
I would be down to go back to Python 3.5 for now. Sad but we can't be running around segfaulting either.
Wish we had a good way to reproduce it.
I have a "spare" zwave stick so I may try passing that through esxi to a fresh vm with HA docker 0.45.1 installed and see if that dies out.
Just to add another data point, I've been using the 3.6 image without issue on an unRaid box. My config does use: shell_command recorder zwave
@balloob So the clean 0.45.1 install in docker with only the zwave stick configured ran without issue (course no real devices traffic running on it) for more than 16 hours, seems to confirm what @mezz64 commented regarding his use of zwave without issue. I just removed zha from my production install and moved it back to 0.45.1. Will report back late today or tomorrow morning.
Hass.IO switch back to Python 3.6 with next stable Alpine linux release. Before we run with python 3.6 from resin and that had to many issue for us.
Segmentation Fault has returned with zha disabled. Nearly same error in message log, difference being info on grsec (172.30.1.70 is a different docker host, not the one currently running HA).
May 28 02:48:50 hass kern.info kernel: [22393.123461] python[2658]: segfault at 8 ip 000076e9bad467b2 sp 000076e98b8be0a0 error 6 in libpython3.6m.so.1.0[76baca2000+29a000]
May 28 02:48:50 hass kern.alert kernel: [22393.123496] grsec: From 172.30.1.70: Segmentation fault occurred at 0000000000000008 in /usr/local/bin/python3.6[thon:2658] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/docker-containerd-shim[docker-containe:2551] uid/euid:0/
May 28 02:48:50 hass kern.alert kernel: [22393.123729] grsec: From 172.30.1.70: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 r /usr/local/bin/python3.6[python:2658] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/docker-containerd-shim[dock
I'm having the same issue without z-wave enabled as well.
For the people experiencing segmentation faults, what OS and architecture is your host?
System 1: Ubuntu 16.04.2 LTS x64 Docker version 17.03.0-ce, build 3a232c8
System 2: Ubuntu 16.04.2 LTS x64 Docker version 17.03.0-ce, build 60ccb22
System 3: Alpine Linux 3.5.2 x64 Docker version 17.05.0-ce, build v17.05.0-ce
Experienced the seg fault on all 3 systems with 0.45.x version.
I'm not running Docker, but am also seeing Segmentation faults in a (freshly setup) venv with HA version 0.45.1 on Python 3.6.1 on Arch Linux on a Raspberry Pi 2 model B (armv7l). HA runtimes before Segmentation faults occur vary from around 3 hours until 34.5 hours.
Components:
@AlexMekkering do you run edge or you self compile python 3.6?
I run Arch Linux (for Arm) with the most recent Python (3.6.1) package (https://archlinuxarm.org/packages/arm/python) and looking at its PKGBUILD it was compiled from the upstream https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tar.xz. It only contains a patch for Lib/test/test_socket.py and installs libpython as read-write but these shouldn't have any impact. It also ensures that libraries (expat, zlib, libffi, and libmpdec) are used from the system instead of included in the build. The build was configured with:
./configure --prefix=/usr \
--enable-shared \
--with-threads \
--with-computed-gotos \
--enable-optimizations \
--without-lto \
--enable-ipv6 \
--with-system-expat \
--with-dbmliborder=gdbm:ndbm \
--with-system-ffi \
--with-system-libmpdec \
--enable-loadable-sqlite-extensions \
--without-ensurepip
The virtualenv was freshly created (as user homeassistant) with:
python -m venv /srv/homeassistant
source /srv/homeassistant/bin/activate
(homeassistant)$ pip install homeassistant
You have also the right CTYPE set on running instance? I know there is a bug since python 3.3 like this: https://github.com/docker-library/python/issues/13
I have LANG=en_US.UTF-8 which is right I guess?
Which event loop are you using. UV Loop or one of the built-in loops?
pip list
doesn't list uvloop so I must be running the default event loop.
I don't know if it's of any use but I managed to debug one of the core dumps and it seems to be related to garbage collection (Thread 1: LWP 1904 was the culprit):
Thread 19 (Thread 0x6b901470 (LWP 1933)):
#0 0x76b1aff8 in select () from /usr/lib/libc.so.6
#1 0x6c812984 in OpenZWave::SerialControllerImpl::Read() ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#2 0x6c8129c0 in OpenZWave::SerialControllerImpl::ReadThreadProc(OpenZWave::Event*) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#3 0x6c8122ac in OpenZWave::ThreadImpl::Run() ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#4 0x6c8122c8 in OpenZWave::ThreadImpl::ThreadProc(void*) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#5 0x76b98e9c in start_thread () from /usr/lib/libpthread.so.0
#6 0x76b21fc8 in ?? () from /usr/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 18 (Thread 0x6d070470 (LWP 1930)):
#0 0x76ba2e40 in do_futex_wait () from /usr/lib/libpthread.so.0
#1 0x76ba30c4 in __new_sem_wait_slow () from /usr/lib/libpthread.so.0
#2 0x76dbf154 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc60c0 in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 17 (Thread 0x7023e470 (LWP 1921)):
#0 0x76b9fc14 in pthread_cond_wait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1 0x76d57c1c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 16 (Thread 0x6d870470 (LWP 1929)):
#0 0x76b1aff8 in select () from /usr/lib/libc.so.6
Backtrace stopped: Cannot access memory at address 0x13f00
Thread 15 (Thread 0x6f23e470 (LWP 1923)):
#0 0x76ba00c8 in pthread_cond_timedwait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1 0x76d5871c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 14 (Thread 0x6b101470 (LWP 1934)):
#0 0x76ba00c8 in pthread_cond_timedwait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1 0x6c7ba6ac in OpenZWave::EventImpl::Wait(int) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#2 0x6c7b7fd4 in OpenZWave::Wait::Multiple(OpenZWave::Wait**, unsigned int, int) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
---Type <return> to continue, or q <return> to quit---
#3 0x6c7be950 in OpenZWave::Driver::PollThreadProc(OpenZWave::Event*) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#4 0x6c8122ac in OpenZWave::ThreadImpl::Run() ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#5 0x6c8122c8 in OpenZWave::ThreadImpl::ThreadProc(void*) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#6 0x76b98e9c in start_thread () from /usr/lib/libpthread.so.0
#7 0x76b21fc8 in ?? () from /usr/lib/libc.so.6
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thread 13 (Thread 0x71cff470 (LWP 1918)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 12 (Thread 0x6e9fe470 (LWP 1924)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 11 (Thread 0x712ff470 (LWP 1919)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 10 (Thread 0x6c101470 (LWP 1932)):
#0 0x76b9fc10 in pthread_cond_wait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1 0x6c7ba758 in OpenZWave::EventImpl::Wait(int) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#2 0x6c7b7fd4 in OpenZWave::Wait::Multiple(OpenZWave::Wait**, unsigned int, int) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#3 0x6c7cb8e0 in OpenZWave::Driver::DriverThreadProc(OpenZWave::Event*) ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#4 0x6c8122ac in OpenZWave::ThreadImpl::Run() ()
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#5 0x6c8122c8 in OpenZWave::ThreadImpl::ThreadProc(void*) ()
---Type <return> to continue, or q <return> to quit---
from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#6 0x76b98e9c in start_thread () from /usr/lib/libpthread.so.0
#7 0x76b21fc8 in ?? () from /usr/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 9 (Thread 0x72eff470 (LWP 1916)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 8 (Thread 0x724ff470 (LWP 1917)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 7 (Thread 0x6fa3e470 (LWP 1922)):
#0 0x76ba00c8 in pthread_cond_timedwait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1 0x76d5871c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 6 (Thread 0x70a3e470 (LWP 1920)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 5 (Thread 0x7527d470 (LWP 1908)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 4 (Thread 0x74a7d470 (LWP 1909)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
---Type <return> to continue, or q <return> to quit---
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 0x740ff470 (LWP 1910)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 0x736ff470 (LWP 1915)):
#0 0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1 0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2 0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3 0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 0x76f34010 (LWP 1904)):
#0 0x76dc5298 in PyObject_GC_Del () from /usr/lib/libpython3.6m.so.1.0
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@AlexMekkering this is great info. Please keep those stack traces coming.
For other people, please run Python under Gdb with the Python extensions. https://wiki.python.org/moin/DebuggingWithGdb
In the meanwhile, I have merged #7799 to run Hass under Python 3.5 again in Docker. Note that our monkey patch for asyncio only is applied for < Python 3.5.3
@AlexMekkering could you check out the branch from https://github.com/home-assistant/home-assistant/pull/7848 and see if running it with the monkeypatch fixes your issue?
Of course! I'll try that this evening...
I've tested the monkeypatch for three days now and haven't seen any Segmentation faults since so I think the monkeypatch fixes this issue.
Alright, I merged the monkey patch. So if you launch Home Assistant with HASS_MONKEYPATCH_ASYNCIO=1 hass
it will apply the monkey patch on 3.6.
Upgraded the docker image to 0.46 today but seems like the issue is still there:
Homeassistant died and dmesg shows this:
[Jun 5 19:59] traps: python[7852] general protection ip:7f82b7ecec46 sp:7f82879d30c0 error:0 in libpython3.6m.so.1.0[7f82b7e0c000+29a000]
Did you add the environment variable to apply the monkey patch?
I didn't, i somehow expected the default the 0.46 release to have the fix activated by default, my wrong ... I added the environment variable now and will report back.
Starting 0.47 we have enabled the monkey patch by default.
did anyone log a bug against python3.6? I'm seeing something similar in a project of ours w/ 3.6.2
I've opened a python bug for the 3.6 issue: https://bugs.python.org/issue31061 as I couldn't find anything related. If anyone can help with more information that would be great!
Make sure you are running the latest version of Home Assistant before reporting an issue.
You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:
Home Assistant release (
hass --version
): 0.45.1Python release (
python3 --version
): 3.6.1Component/platform:
Description of problem: Docker file installed as explained in documentation. Getting segmentation faults after a couple hours of running. There are no messages in the HA error log when it seg faults. Only place I have seen it logged is in the messages log from dmesg. It contains the following:
Switching back to 0.44.2 all works without issue and 0.45.0/1 is the first time I have experienced any seg faults. Further info. I installed 0.45.0 (via docker rm, pull, run, etc) late saturday or early sunday on an ubuntu 16 esxi vm (my normal environment run HA without issue for the last year or so) and within a few hours it died with a seg fault error. Restarted HA, ran fine for a few hours then seg faulted again. At this point I moved back to 0.44.2 via docker and all ran fine. On monday i moved my HA docker install to a physical box running latest alpine linux with docker and version 0.44.2 (turning off my vm install) and it came up without issue and ran fine until I decided to try 0.45.1 again yesterday. So on the physical alpine linux box I again used docker to rm, pull, run, etc and move to 0.45.1. All came up fine then a few hours later the seg fault again occurred. Restarted the container, all was again fine for a few hours and again seg fault. Moved back to 0.44.2 and its been running without issue since then.
Seg faults on two different "machines" with two different operating systems concerns me.
Expected: No seg faults