Closed turbokongen closed 8 years ago
I've been hitting a random segfault as well starting yesterday on dev.
Some tips I've seen for debugging:
To help debug async, you can add this under line 115 in core.py self.loop.set_debug(True) from gitter
And also using gdb:
if you can run with gdb, the exact cause can be determined, https://wiki.python.org/moin/DebuggingWithGdb you can attach to the running process and wait for it to segfault, or run hass with gdb and wait for it pull info on the threads, py-bt gets a python stack-trace you'll need to run the command from a terminal so that when it stops, you can interact with it (not a system-started script) from gitter
Initial output with gdb running: http://hastebin.com/uvegobuweg.pl bt: http://hastebin.com/zigemufaya.swift info threads: http://hastebin.com/vecacuhira.sql
Recent crash from tonight with faulthandler: http://hastebin.com/qozubivewa.vbs
There are 47 threads running. Highlighted some things from the log that could be it. If we could get a log from another user we could compare.
16 callbacks are waiting for the eventloop (searched for run_callback_threadsafe
After the segfault, the output from ping is printed to the command line, means that a command line sensor was running.
File "/usr/local/lib/python3.4/subprocess.py", line 491 in _eintr_retry_call
File "/usr/local/lib/python3.4/subprocess.py", line 1520 in _try_wait
The recorder is writing a query:
File "/home/hasss/.homeassistant/deps/sqlalchemy/engine/default.py", line 450 in do_execute
Verisure was updating your lock (search for verisure/devices/lock.py
)
RFXtrx is reading from the serial connection (search for serial/serialposix.py
)
If possible, could you turn one of the following off 1 by 1 to see if it stops the segfaulting: command line switch, rfxtrx, recorder, verisure
Here's my config, but it doesn't look like we have much in the way of component overlap:
homeassistant: !include homeassistant.yaml
zone: !include zone.yaml
group: !include groups.yaml
scene: !include scenes.yaml
logbook:
frontend:
history:
discovery:
zeroconf:
sun:
http: !include http.yaml
mqtt: !include mqtt.yaml
device_tracker: !include device_trackers.yaml
sensor onlinedness: !include sensor-onlinedness.yaml
switch harmony hub: !include harmony-switches.yaml
switch template: !include template_switches.yaml
sensor forecast: !include forecast.yaml
sensor speedtest: !include speedtest.yaml
notify slack: !include notify-slack.yaml
light hue: !include hue.yaml
ecobee: !include ecobee.yaml
sleepiq: !include sleepiq.yaml
#zwave: !include zwave.yaml
input_boolean: !include input_booleans.yaml
emulated_hue: !include emulated_hue.yaml
vera: !include vera.yaml
script: !include_dir_named scripts
automation: !include_dir_merge_list automations
Home Assistant release (hass --version): 0.29.0dev
Python release (python3 --version): 3.4.2
Running on a Raspberry Pi 2 with Raspberian Jessie.
I didn't see any segfaults yesterday, but I have hass running under gdb now
Crash from this night. Took longer this time. Now I have disabled all commandlne stuff including systemmonitor. http://hastebin.com/toxerobunu.vbs next disabling template sensors
It's now been running without problems for a day and a half. I think I will reenable the commandline stuff, and see what happens.
If commandline stuff is to blame, we should port it over to use async stuff https://docs.python.org/3/library/asyncio-subprocess.html
Two things were disabled. First Without command line stuff == crash Disabeld template stuff, running fine. Now I have enabled commandline stuff, and will report back ;)
Looks like I'm in the same boat, dmesg is showing me;
[91069.635106] python[6012]: segfault at 10 ip 00007fec2e6f52d0 sp 00007fec2822dd20 error 4 in libpython3.5m.so.1.0[7fec2e556000+47c000]
[94613.108242] python[331]: segfault at 10 ip 00007f05660fb2d0 sp 00007f05603750d0 error 4 in libpython3.5m.so.1.0[7f0565f5c000+47c000]
[179867.340974] python[26568]: segfault at 10 ip 00007f9fa35732d0 sp 00007f9f9dd59e90 error 4 in libpython3.5m.so.1.0[7f9fa33d4000+47c000]
Closing #3484 as it's a dupe.
@lwis : Which components are you using?
@Danielhiversen
mqtt notify logbook frontend updater sun history ifttt influxdb emulated_hue netgear owntracks kodi braviatv sonos samsungtv cast proximity fastdotcom command_line forecast time_date google_travel_time shell_command zones
Think I got all the components + platforms.
@lwis : Could you try to disable the command_line component?
@Danielhiversen sure, what's the thought behind why that would cause a segfault?
No need to disable command line. It sems to be traced to any template stuff.
Running gdb -ex r --args python -m homeassistant --config /config
now.
Got a new somewhat different traceback: http://hastebin.com/ilepedefad.vbs
Finally caught one in the act: https://gist.github.com/technicalpickles/23e097e213fcd4beb2c83c0e8cf7e06b
It's at a gdb prompt. Anything I should grab while I have it? I tried py-bt
but gdb said the command wasn't found.
If anyone with a segfault using the latest dev could pip install uvloop
and restart HASS, let me know if the segfaults go away.
$ pip install uvloop
Collecting uvloop
Using cached uvloop-0.5.3.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-uyo_w28w/uvloop/setup.py", line 11, in <module>
raise RuntimeError('uvloop requires Python 3.5 or greater')
RuntimeError: uvloop requires Python 3.5 or greater
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-uyo_w28w/uvloop/
@bbangert it looks like uvloop requires python 3.5. As far as I can tell, there isn't a Raspberian Jessie package for it. I can build from source to test, but wouldn't requiring it be a pretty significant change?
@technicalpickles awwww, bummer. Yes, its not feasible to require it since it'd up the required python too far.
@technicalpickles we wouldn't make it mandatory but it will help to be able to narrow down the seg faults to the default event loop implementation.
@bbangert @balloob I use uvloop and still experience the segfaults.
Time to get a better overview of the segfault data thus far:
Architecture | OS | Python | Loop | User | templates? | Command Line |
---|---|---|---|---|---|---|
AMD64 | Ubuntu 16.04 | 3.4.4 | Default (Epoll?) | @turbokongen | V | V |
AMD64 | Ubuntu 16.04 | 3.5.2 | uvloop | @turbokongen | V | V |
AMD64 - Docker | Alpine | 3.5.2 | Default (Epoll?) | @lwis | V | V |
AMD64 - Docker | Alpine | 3.5.2 | uvloop | @lwis | V | V |
RPi (?) | Raspbian Jessie | 3.4.2 (?) | Default (Epoll?) | @technicalpickles | V | V |
I can uninstall uvloop if required.
I'm running an Alpine Docker image on Ubuntu 16.04, also using an amd64 machine.
Everyone with a segfault is using a component with the template platform? And disabling the template platform config bits makes the segfaults go away?
I've not tried disabling my template configuration, but I'm happy to branch my config to test.
On Sat, 24 Sep 2016, 7:01 am Ben Bangert, notifications@github.com wrote:
Everyone with a segfault is using a component with the template platform? And disabling the template platform config bits makes the segfaults go away?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/home-assistant/home-assistant/issues/3453#issuecomment-249347640, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1TOwyw-Ei_vkpkbQPA7xz-tscATfMKks5qtLybgaJpZM4KAlqc .
For people experiencing this problem, try the following three things:
import jinja2.debug to the top of
helpers/template.py`ENV.from_string(…)
:import threading
lock = threading.Lock()
def render(…):
…
with lock:
try:
return ENV.from_string(…)
@balloob when you say template sensors, do you mean sensor.template
? As I don't use any, I do however use templates in other places, such a value_template
in sensors.
You're right. I meant to say anything that uses templates.
@balloob starting now, didn't realise how much I used templates!
I didn't experience any crashes running without the templates, but I didn't feel that it was exercising much functionality - most of my setup uses templates so what I ended up with was quite minimal. I've been running for ~3 hours now with import jinja2.debug
, and I'm yet to see any instability, I will however keep this post updated.
Crash with jinja2.debug import : http://hastebin.com/tekuxudeja.vbs
Just had a segfault also, switching to a locked render.
I just got (running with the template render lock);
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
Not sure what went wrong.
It segfaults everytime on startup?
@bbangert I've seen every startup when running with gdb. Doing continue
seems to startup as expected after that.
I can make a new template system based on http://www.joonis.de/de/code/templite
Most recent dev, I'm no longer seeing segfaults, but I do get this sigabrt pretty quickly after starting release_the_kraken.py
:
16-09-25 23:00:01 INFO (CP Server Thread-15) [homeassistant.components.http] Serving /api/states/kraken.test to 127.0.0.1 (auth: True)
16-09-25 23:00:01 INFO (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=kraken.test, new_state=<state kraken.test=Run 249 @ 2016-09-25T23:00:01.545632-04:00>, old_state=<state kraken.test=Run 250 @ 2016-09-25T23:00:01.214649-04:00>>
16-09-25 23:00:01 INFO (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=kraken.test, new_state=<state kraken.test=Run 275 @ 2016-09-25T23:00:01.598311-04:00>, old_state=<state kraken.test=Run 249 @ 2016-09-25T23:00:01.545632-04:00>>
python3: ../Modules/gcmodule.c:364: update_refs: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed.
I've run this with gdb, but it ends up continue
ing each thread, so I haven't been able to grab py-bt
.
Sadly I experienced 2 segfaults overnight with the fix :(
Potentially related: https://bugs.python.org/issue26617
I have still no segfaults after last nights kraken attacks. Running fine here. I swithced back to Python 3.4.4 though, and there I don't have uvloop. and #3515 applied
For anyone running into this, https://gist.github.com/balloob/c364470d17c01615b763dd1772791d43 can be used to stress test it.
python release_the_kraken.py
I still get segfault as well
PID: 1252 (python)
UID: 1001 (per)
GID: 1001 (per)
Signal: 11 (SEGV)
Timestamp: Mon 2016-09-26 00:12:31 UTC (19h ago)
Command Line: python -m homeassistant --daemon
Executable: /usr/bin/python3.5
Control Group: /user.slice/user-1001.slice/session-c3.scope
Unit: session-c3.scope
Slice: user-1001.slice
Session: c3
Owner UID: 1001 (per)
Boot ID: 6447cfbb8a0c4b5d8cf2ac3a643f6d94
Machine ID: 529280e30d1a42a2b88e3074fb0b13f8
Hostname: alarmpi
Coredump: /var/lib/systemd/coredump/core.python.1001.6447cfbb8a0c4b5d8cf2ac3a643f6d94.1252.1474848751000000.lz4
Message: Process 1252 (python) of user 1001 dumped core.
(sorry about close and comment 😳)
I was also getting fairly consistent segfaults with Python 3.6, but have not seen one since downgrading to Python 3.4.
I split out https://github.com/home-assistant/home-assistant/issues/3538 for the SIGABRT I was seeing. I tried upgrading to python 3.5.2 just in case, and kraken passed after 12000 iterations, but I left it running with my full config and got another segfault. Trying to get a proper bt, but I installed with pyenv and I'm not sure how to get a debug build with that yet.
@lwis got a repo going with two Dockers: https://github.com/lwis/test-kraken-hass
To repro it on my Mac:
docker-compose.yml
to say python kraken.py 10000
hass/Dockerfile
to say RUN pip install git+https://github.com/home-assistant/home-assistant.git@b3d67a7ed91030e9f2f42cc5b24a277ab8787c19
kraken.py
with this job
function:def job(i):
try:
remote.set_state(api, 'sensor.test_sensor_1', 'Run {}'.format(i))
remote.set_state(api, 'sensor.test_sensor_2', 'Run {}'.format(i))
except Exception:
return
Kraken starts running after 20 seconds. I got a repro 50% of the time.
We should create a debug script based on this StackOverflow answer to debug things with docker-compose
: http://stackoverflow.com/questions/30381410/debugging-linked-docker-containers-when-using-docker-compose/34322396#34322396
Looks like my catching of an exception doesn't work, prepare for a lot of output from Kraken while it tries but fails to finish the rest of its jobs.
Looks like there has been some confusing information in the chat on what happened to my install, so I'll clarify: After applying #3515 to my install on python 3.4.4 without uvloop it ran completely without problems for 24 hours. That was after 5 (50 000) kraken script attack that evening. Yesterday night I rebased of dev again, and It's still been running fine. Still no segfaults.
guys using this modified version of the kraken.py sensor here https://gist.github.com/joyrider3774/fe8816f8377a19393408ac5512db132c i can make python3.exe crash on windows. I only added the sensor to my existing config with 50 tentacles. so it seems windows is affected as well ? the reason i modified the kraken.py sensor is because it uses the date command specific to linux so i just replaced the seconds with a random float value. The crashes happen in less than a minute my python version is :Python 3.4.3 using the 32 bit executables on a 64 bit windows 10
Make sure you are running the latest version of Home Assistant before reporting an issue.
You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:
Home Assistant release (
hass --version
): 0.29.0devPython release (
python3 --version
): 3.4.4Component/platform: asyncio
Description of problem: Random segfaults. No pattern.
Expected: Normal operation.
Problem-relevant
configuration.yaml
entries and steps to reproduce:Traceback (if applicable): http://hastebin.com/donuvejugu.pas http://hastebin.com/ulicipebes.pas http://hastebin.com/salopofaya.pas http://hastebin.com/idereremil.pas http://hastebin.com/jivipugibe.pas http://hastebin.com/ixadobikul.pas http://hastebin.com/qalozikawe.pas http://hastebin.com/reduwuvewi.pas
Additional info: Components and platforms: device_tracker: ddwrt notify: pushbullet zwave: climate, switch, sensor, binary_sensor rfxtrx: switch, light, sensor verisure: alarm, sensor, switch, lock media_player: cast camera: ffmpeg thermostat: heat_control climate: generic_thermostat switch: rfxtrx, zwave, command_line, template light: rfxtrx sensor: rfxtrx, template, command_line, systemmonitor, yr Automations and scripts as well.
OS: ubuntu 16.04 64bit