home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.74k stars 29.99k forks source link

Error in lutron.py -- Hass.io loses ability to control LutronRA2 entities #20348

Closed grantalewis closed 5 years ago

grantalewis commented 5 years ago

Home Assistant release with the issue: 0.85.1

Last working Home Assistant release (if known): 0.84.6

Operating environment (Hass.io/Docker/Windows/etc.): RPi3, Hass.io

Component/platform: https://www.home-assistant.io/components/lutron/

Description of problem:

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/service.py", line 287, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 73, in turn_off
    self._lutron_device.level = 0
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
  File "/usr/local/lib/python3.6/telnetlib.py", line 290, in write
    self.sock.sendall(buffer)
TimeoutError: [Errno 110] Operation timed out

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

lutron:
  host: 192.168.1.225
  username: lutron
  password: integration

Traceback (if applicable):

Additional information:

grantalewis commented 5 years ago

Still occurring in 0.86.1

ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/service.py", line 287, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 73, in turn_off
    self._lutron_device.level = 0
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
AttributeError: 'NoneType' object has no attribute 'write'
cdheiser commented 5 years ago

This sounds like https://github.com/thecynic/pylutron/issues/17

cdheiser commented 5 years ago

I've created https://github.com/thecynic/pylutron/pull/23 to patch the pylutron library and hopefully resolve this bug. Once that patch is accepted, we still need to wait for a new version to get published and update home-assistant to use the new version.

grantalewis commented 5 years ago

Still seeing this in 0.87.1

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/service.py", line 289, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 73, in turn_off
    self._lutron_device.level = 0
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
AttributeError: 'NoneType' object has no attribute 'write'

(Apologies if my attempts to get this noticed are in the wrong place. I'm surprised this isn't getting much uptake. Is no one else is seeing this problem?)

JonGilmore commented 5 years ago

I'll not seeing this error. Can you describe the behavior in more detail? Does it work at all? Does it stop working?

grantalewis commented 5 years ago

I'll not seeing this error. Can you describe the behavior in more detail? Does it work at all? Does it stop working?

After rebooting things seem to be OK for a short period of time -- maybe as much as 30 minutes. Then on/off commands from Hass.io start failing. There's a second or two of hesitation when clicking a light toggle, no result, and then the toggle returns to its prior position. Checking the logs shows the above error.

cdheiser commented 5 years ago

Do you reprogram or make other changes to your RadioRA setup at all? I've only experienced the behavior you describe when updating the programming in the main repeater, or otherwise suffer some odd network problem.

I know it's annoying, and I'd love to get this fixed, but right now it's at the mercy of the pylutron maintainer. We could consider forking the library, or asking the maintainer if they would consider adding additional people to accept/reject pull requests.

grantalewis commented 5 years ago

@cdheiser I really do appreciate the feedback and willingness to help. Yes, I make changes to my setup fairly frequently, but before now it seemed that cold-restarting everything usually resolved any issues. This problem does seem different.

Yesterday I did a completely new setup of HA 0.87.1 on a newly formatted SD card. It ran well for a few hours, but by bedtime, toggling lights was beginning to flake out again. Then by this morning the toggles were completely unresponsive. Slightly different symptom: the toggle would stay in the ON position, but the Lutron entity was unaffected.

The logs show this familiar info:

2019-02-17 06:53:07 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.1875282512] Error handling message: {'type': 'call_service', 'domain': 'light', 'service': 'turn_on', 'service_data': {'entity_id': 'light.main_stairs_main_hall'}, 'id': 16}
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/websocket_api/decorators.py", line 17, in _handle_async_response
    await func(hass, connection, msg)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/websocket_api/commands.py", line 148, in handle_call_service
    connection.context(msg))
  File "/usr/local/lib/python3.6/site-packages/homeassistant/core.py", line 1130, in async_call
    self._execute_service(handler, service_call))
  File "/usr/local/lib/python3.6/site-packages/homeassistant/core.py", line 1152, in _execute_service
    await handler.func(service_call)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/__init__.py", line 287, in async_handle_light_on_service
    await light.async_turn_on(**pars)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 69, in turn_on
    self._lutron_device.level = to_lutron_level(brightness)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
  File "/usr/local/lib/python3.6/telnetlib.py", line 290, in write
    self.sock.sendall(buffer)
TimeoutError: [Errno 110] Operation timed out

I'll just revert to 0.84.6 and wait it out. That version is pretty much rock-solid for me.

Again, my thanks.

ToddNJ commented 5 years ago

I had this same problem. Works for me in 0.86.4, failed when I updated to 0.88.1, I rolled back. I'm running HASS.IO in docker on Ubuntu 18.0.4 on a Laptop. Lutron Bridge is model L-BDG2

Error Log Entry:

2019-02-27 20:02:14 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/service.py", line 289, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/components/lutron_caseta/switch.py", line 33, in async_turn_on
    self._smartbridge.turn_on(self._device_id)
  File "/usr/local/lib/python3.7/site-packages/pylutron_caseta/smartbridge.py", line 190, in turn_on
    return self.set_value(device_id, 100)
  File "/usr/local/lib/python3.7/site-packages/pylutron_caseta/smartbridge.py", line 182, in set_value
    return self._writer.write(cmd)
AttributeError: 'NoneType' object has no attribute 'write'
stale[bot] commented 5 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

grantalewis commented 5 years ago

This is still an unresolved issue in 95.4.

JonGilmore commented 5 years ago

@ToddNJ your issue looks different than this one, appears your using caseta? @grantalewis is using RadioRa2, and a different library altogether. I'd suggest you open another issue for yours.

@grantalewis can you describe your setup a bit? I'm afraid the troubleshooting that @cdheiser is doing may possibly provide a workaround for your issue, but I don't think it addresses the root cause: why is your HASS instance losing connection to your Ra2 main receiver?

I believe we've chatted about this before on Reddit, but I'll add that my RadioRa2 install is working well with the latest home assistant release (still), so I think we need to diagnose the why.

grantalewis commented 5 years ago

The core of my system is OmniPro II / Lutron RadioRA2. Nearly all of my light and fan controls are Lutron. I got my start with Home Assistant in 2017 (v0.7x-ish). I started on Hasbian, moved to Hass.io, and then have gone back/forth a couple of times since. I enjoyed a very smooth HA experience until January, 2019 with v0.85x at which time I had to revert to 0.84.6. I've been stuck on that version since January. (Despite its increasing age and ever-growing list of missing features, 0.84.6 still performs really well for me.)

I needed an environment that would let me experiment more and so moved Hass.io to VirtualBox earlier this year (Ubuntu 64). That didn't provide me with a solution to my Lutron problem, but it at least gave me a way to test new versions while still having an easy road back to my stable 0.84.6.

I also keep an RPi3 updated with the latest HA release in case a solution suddenly becomes available. (I hope to eventually be able to use Hass.io on RPi3 as my day-to-day system.)

I've tried literally every release since 0.84.6, and they all exhibit exactly the same issue. After rebooting, the system seems to be fine for the first hour or two and then I'll notice that the UI begins losing the ability to control entities. This video demonstrates the problem ( <-- Dropbox link to MP4 video). When the issue begins happening, I can control an entity 1 more time using its toggle. The entity will reflect the change, but the toggle will revert to its prior state as if somehow indicating that the change was unsuccessful. At that point, the entity will not respond to any additional changes from the UI. The only way to regain control is to reboot.

Sorry if this is long-winded but I didn't want to leave anything out. I'm happy to help in any way I'm able, so please feel free to call on me for testing, more information, etc.

JonGilmore commented 5 years ago

Thats a good explanation, I appreciate it. The video helps me understand the behavior, which is different than I was understanding from your previous stuff in writing. Going to continue our convo on discord to keep this issue clean, if we come up with anything, we can post it here.

Tangston311 commented 5 years ago

I’m glad I found this because I’m experiencing exactly the same issue with HA and Ra2. At first everything seems to work just fine, but after an hour or so I experience the exact behavior seen in the video. My automations and service calls seems to still work just fine, but controlling through the front end becomes unusable.

Happy to do what I can to provide more info to help troubleshoot.

I’m on 93.1 using Hass.io.

thecynic commented 5 years ago

@Tangston311 Interesting. So in your case the actual communication with the lutron controller itself is not affected? Just that frontend gets out of sync? If it's the latter than I'll need some help from someone who's more higher-level stack in HASS oriented to help figuring that part out.

@grantalewis Do you happen to have logs for when this happens? I'd also be curious to see if manual service calls work fine as well. Would be good to identify where in the stack the issue is creeping in.

thecynic commented 5 years ago

@grantalewis Oh I see the stacks above. Ok, yes, my pylutron PR should help with that. The connection management previously was very optimistic with respect to how well things remain connected, and thus was not really robust. I wonder if your network is somewhat flaky. Maybe you have things on wifi and they drop, etc? Doesn't really matter, just curious as to why you see it so much more often.

grantalewis commented 5 years ago

@thecynic, @JonGilmore suggested debug logging via Discord DM, so I'll do that. Going with

logger:
   default: debug

for now but let me know if you want other options enabled, etc.

It seems that @Tangston311 and I are seeing the exact same symptoms. UI misbehaves but entities still respond to homeassistant.toggle, homeassistant.turn_on, homeassistant.turn_off, etc. via the Services panel.

Network: I suppose it could be network-related, but that doesn't account for rock-solid performance from 0.84.6. It's like night and day: fantastic performance with 0.84.6 and before; completely predictable failure with every version since.

Tangston311 commented 5 years ago

@thecynic , @grantalewis is correct, I think we're both experiencing the same behavior. I have a few automatons that work by simply turning things on or off (regardless of current state) with "turn_on" or "turn_off" service calls, but controlling through the UI becomes inoperable. Additionally, when this happens, HA no longer knows the current state of a light, so if I attempt to create an automation based on the current state of an entity it won't work because it thinks the light is currently off when it's actually on, or vice versa.

Network issues could definitely be a factor for me: I noticed that sometimes the Ra2 Inclusive software couldn't find the main repeater, so I tried relocated the main repeater to a different location. Now the Ra2 software can always find the main repeater, and I noticed this lengthened the period of time that the HA UI would function: it went from bombing out after ~30 minutes to lasting most of the day, but inevitably it still exhibits the same behavior. I can't explain why because the main repeater was hard-wired in both locations, but it does point to some network instability that may be causing the issue.

grantalewis commented 5 years ago

I don't mean to be stubborn, but unless 0.84.6 is just more tolerant of borderline network issues, I just don't think it's network related. I have no problems with 0.84.6 whatsoever -- not one, not ever.

grantalewis commented 5 years ago

OK, let 0.97.2 run for a while today. As soon as I saw the problem occur, I grabbed the logs from 3 or so minutes prior. I deleted a lot of log entries that I'm fairly certain are not involved in the issue. The .txt file is attached. Thanks for taking a look.

log.txt

JonGilmore commented 5 years ago

OK, let 0.97.2 run for a while today. As soon as I saw the problem occur, I grabbed the logs from 3 or so minutes prior. I deleted a lot of log entries that I'm fairly certain are not involved in the issue. The .txt file is attached. Thanks for taking a look.

Hm, this log doesn't include the "normal" stack trace. I'm curious, did it show up again this time? I'll go through it just in case, but I'm not sure this capture is going to be much help.

grantalewis commented 5 years ago

In my configuration.yaml:

logger:
   default: debug

As soon as I saw the problem occur I grabbed the logs from the prior 3 minutes. Maybe I need to back up further?

thecynic commented 5 years ago

@grantalewis You aren't being stubborn :) I wanted to confirm whether or not we were losing communication with the lutron main repeater or if the issue is somewhere in HASS. Given that you can continue to change state via the API/service calls, it means that the underlying comms are good and the issue is in the HASS integration somewhere (either in HASS or in the callback mechanisms from pylutron)

@Tangston311 trying to reconcile your answer with @grantalewis... hmm

cdheiser commented 5 years ago

I was hoping to see an exception thrown, and thus we could blame some failure for killing a thread.

On Fri, Aug 16, 2019 at 10:12 AM thecynic notifications@github.com wrote:

@grantalewis https://github.com/grantalewis You aren't being stubborn :) I wanted to confirm whether or not we were losing communication with the lutron main repeater or if the issue is somewhere in HASS. Given that you can continue to change state via the API/service calls, it means that the underlying comms are good and the issue is in the HASS integration somewhere (either in HASS or in the callback mechanisms from pylutron)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/home-assistant/home-assistant/issues/20348?email_source=notifications&email_token=ACQARWWO5TUQORHLTUATENLQE3NYHA5CNFSM4GR2ZVYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4PFWSY#issuecomment-522083147, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQARWWYRCMQHN4DDSMOD63QE3NYHANCNFSM4GR2ZVYA .

grantalewis commented 5 years ago

I’m going to give it another go later on this afternoon. Maybe in an effort to get rid of extraneous login entries I deleted something important.

Tangston311 commented 5 years ago

@thecynic , it's admittedly strange. This most recent iteration the lights worked for almost exactly 24 hours before the issue appeared again. I tried setting my logger to "debug", but after a few hours the log was so big that HA kept crashing when I tried to load it. So instead I tried to set my logger as follows to only get stuff related to Lutron:

logger:
  default: error
  logs:
    homeassistant.components.lutron: debug
    homeassistant.components.lutron.light: debug
    homeassistant.components.lutron.switch: debug
    homeassistant.components.lutron.cover: debug
    homeassistant.components.light: debug
    homeassistant.components.cover: debug
    homeassistant.components.switch: debug

...but none of the actions I take (like turning a light on or off) appear in the log when it's set like that.

I was looking at integrating another component and stumbled across this description of how lights work on the frontend, taken from here:

When you turn a light off in Home Assistant, the Home Assistant state immediately turns to off, so that the switch in the frontend reflects that you turned the light off. Then, the off command is actually sent to the light, and depending on the type of light it either waits for a response back confirming the command or it polls the device (or does nothing). Sometimes what will happen is the state will briefly change to on before the lights actually turn off and the state is updated back to off. This can often be seen in the frontend where you turn a light off and the switch briefly goes back to on and off again.

It sounds like in our case the actual response back from the light is just failing for some reason. Or perhaps this is totally unrelated...just thought I'd share!

Let me know I can provide any detail (or adjust my logger or something) to be helpful.

grantalewis commented 5 years ago

OK, here's the next set of logs. Had an odd thing happen on startup that I won't go into because I suspect it's a red herring. I can give more info if you like on that.

I tried to be a lot more sparing with what I removed from the log entries. This time I included the entire log from boot up to the first occurrence of the problem. The only references I removed were those containing family members' names, my domain name, longitude or latitude, or references to email addresses.

The two entities that first showed the problem were

light.main_stairs_main_hall
light.office_office_fan_light

Good luck -- Thanks!

(edited file to remove info; see below)

JonGilmore commented 5 years ago

OK, here's the next set of logs. Had an odd thing happen on startup that I won't go into because I suspect it's a red herring. I can give more info if you like on that.

I tried to be a lot more sparing with what I removed from the log entries. This time I included the entire log from boot up to the first occurrence of the problem. The only references I removed were those containing family members' names, my domain name, longitude or latitude, or references to email addresses.

The two entities that first showed the problem were

light.main_stairs_main_hall
light.office_office_fan_light

Good luck -- Thanks!

Just took a cursory glance at this, you may want to remove API keys and other PII. Your address is in there...

grantalewis commented 5 years ago

Edited log file. home-assistant.log.zip

thecynic commented 5 years ago

@grantalewis ok, there's definitely some odd sequences here that I'd love to explore. No idea if it's related yet, but it smells fishy.

There are a lot of continuous omnilink status messages spamming homeassistant. is that expected?

In the middle of those, something is forcing us to requery the main repeater continuously. That may be related to the change by @JonGilmore that requeries the MR for all status checks and should be better with PR #25939 . But since you say that the backend works, we may be hitting a race condition somewhere that is getting exposed by the stream of queries? No idea yet.

One question i have is the interaction between omnilink and lutron modules. Do you have something that ties them together somehow? Like omnilink makes a query, and you service that query via lutron? Not exactly sure what questions to ask yet.

An example excerpt:

2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/unit21/state (retained): b'OFF'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=light.family_room_mantel_lamp, old_state=None, new_state=<state light.family_room_mantel_lamp=off; lutron_integration_id=47, friendly_name=Mantel Lamp, supported_features=1, icon=mdi:lamp @ 2019-08-16T17:11:26.373377-04:00>>
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/unit22/state (retained): b'OFF'
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Subscribing to omnilink/unit453/state
2019-08-16 17:11:26 DEBUG (SyncWorker_3) [pylutron] Sending: ?OUTPUT,32,1
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (Thread-6) [pylutron] handle_update 32 -- ['1', '0.00']
2019-08-16 17:11:26 DEBUG (Thread-6) [pylutron] Updating 32(Bookcase Lights): s=1 l=0.000000
2019-08-16 17:11:26 DEBUG (SyncWorker_3) [pylutron] Sending: ?OUTPUT,32,1
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Subscribing to omnilink/button16/state
2019-08-16 17:11:26 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/status (retained): b'online'
2019-08-16 17:11:26 DEBUG (Thread-6) [pylutron] handle_update 32 -- ['1', '0.00']
2019-08-16 17:11:26 DEBUG (Thread-6) [pylutron] Updating 32(Bookcase Lights): s=1 l=0.000000

@Tangston311 I assume you don't use omnilink? Could you also provide some debug logs?

thecynic commented 5 years ago

@grantalewis Yeah, there's definitely an interaction between omnilink and lutron that I have certainly never tested and don't yet know exactly how to debug without the system. For example, here you set the office light from home assistant, which gets sent to lutron. Almost immediately it gets an update from omnilink saying that the light is on and has the new brightness.

I'll have to dig and see whether or not this can be a problem? Not sure how other components handle these multi-master issues.

2019-08-16 17:23:15 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event call_service[L]: domain=light, service=turn_on, service_data=entity_id=light.office_office_fan_light>
2019-08-16 17:23:15 DEBUG (SyncWorker_1) [pylutron] Sending: #OUTPUT,9,1,50.00
2019-08-16 17:23:15 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/unit40/state: b'ON'
2019-08-16 17:23:16 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=light.office_light, old_state=<state light.office_light=off; friendly_name=Office Light, supported_features=1 @ 2019-08-16T17:11:01.515717-04:00>, new_state=<state light.office_light=on; brightness=0.0, friendly_name=Office Light, supported_features=1 @ 2019-08-16T17:23:16.003862-04:00>>
2019-08-16 17:23:16 DEBUG (MainThread) [homeassistant.components.mqtt] Received message on omnilink/unit40/brightness_state: b'50'
2019-08-16 17:23:16 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=light.office_light, old_state=<state light.office_light=on; brightness=0.0, friendly_name=Office Light, supported_features=1 @ 2019-08-16T17:23:16.003862-04:00>, new_state=<state light.office_light=on; brightness=128, friendly_name=Office Light, supported_features=1 @ 2019-08-16T17:23:16.003862-04:00>>
thecynic commented 5 years ago

@grantalewis Hmm, I wonder.... Once the badness happens, does it happen to all lutron entities all at once? Or do lights drop out over time? In other words, when this office_office_fan_light starts exhibting issues, do the other lights work as expected still?

grantalewis commented 5 years ago

There are a lot of continuous omnilink status messages spamming homeassistant. is that expected?

OmniLinkBridge https://github.com/excaliburpartners/OmniLinkBridge

is a service that makes OmniLink entities available to Home Assistant that would not be otherwise (flags, buttons, sensors, thermostats for instance). I would assume that the amount of traffic is the same regardless of HA version but can't comment on that much since it works without a hitch on 0.84.6 and I haven't had occasion to generate a debug log. It's possible that I could request that the developer add an "exclude" option -- possibly useful in my case since all Omni entities are mirrored on the RA2 side. So exclude all entites that would be duplicated by RA2, while leaving flags, buttons, etc. available.

One question i have is the interaction between omnilink and lutron modules. Do you have something that ties them together somehow?

@JonGilmore asked me pretty much the same question on Discord and I had to plead ignorance. I can try to get information from my installer if that would be helpful. This firmware release writeup might shed some light:

http://www.homeauto.com/Downloads/Products/AutomationControllers/OmniProII/haiprivate/20I04-16UPGV2-7.pdf

Best guess: I think it's a physical serial connection, but I'm not sure of the communications protocol.

Once the badness happens, does it happen to all lutron entities all at once? Or do lights drop out over time?

When one entity starts misbehaving, they all start misbehaving.

If the entity is off when the problem starts:

If the entity is on when the problem starts, the opposite holds true:

However, all entities can still be controlled via the States panel, no odd behavior.

Tangston311 commented 5 years ago

@thecynic, I don't use OmniLink, but I DO have Lutron also integrated with a Savant system. Savant hooks up with Lutron via IP just like Home Assistant, but I rarely interact with Lutron via Savant - it's almost exclusively through HA and/or Homekit (I should also mention that I have a Connect Bridge to surface the Lutron entities to Homekit).

I'll try and pull some logs as well - my only issue last time was it takes a day or more now for my system to exhibit this behavior, and by that time the debug log was so big it caused my browser and HA to crash. I'll give it another shot though.

JonGilmore commented 5 years ago

@thecynic something I'd mentioned to @grantalewis to check on (which relates to @Tangston311 as well) was to make sure only 1 service was using a Lutron username/password. So, for example, Savant should be using a user, and HA should be using a different user. If I can recall, @grantalewis was able to confirm that this was not the issue (he was previously using a shared user/passwd). Lutron doesn't recommend this. On another point - @thecynic do you also use a connect bridge? I don't have one in my setup, maybe it's related somehow? Admittedly, I know nothing about them...

grantalewis commented 5 years ago

FYI to chime in: (1) I do have a connect bridge, and (2) after @JonGilmore's recommendation I created and am using a second user login/password for the lutron entry in my config.

Tangston311 commented 5 years ago

Ok I just added a second HA user for Lutron and applied that to my config as well.

JonGilmore commented 5 years ago

Ok I just added a second HA user for Lutron and applied that to my config as well.

I'm hopeful this will at least help.... When I was testing last night with 2 diff HA setups using the same lutron username/password, I did see disconnects on the 2nd instance that started (but the instance kept re-connecting b/c of the re-connect code, lol)

thecynic commented 5 years ago

@Tangston311 Does Savant integrate with HA? Just curious.

@JonGilmore glad to hear reconnect code working well for you too. I'll merge it soon.

Also, I do not have a connect bridge. But as far as HA is concerned, we only talk to the Main repeater, and never the connect bridge, right?

Tangston311 commented 5 years ago

@thecynic , unfortunately not (to my knowledge anyway), but they did just come out with HomeKit compatibility for some of their products so perhaps someday it’ll be possible to integrate once the HomeKit Controller is updated to handle remotes. That would be cool.

JonGilmore commented 5 years ago

Also, I do not have a connect bridge. But as far as HA is concerned, we only talk to the Main repeater, and never the connect bridge, right?

Right, that's correct. I'm just trying to figure out why only 2 people are facing this issue while others continue without issue. I imagine something has to be different/unique with their setups...

cdheiser commented 5 years ago

The only time I ever suffer connection issues is when I reprogram or otherwise power cycle the main repeater.

On Sun, Aug 18, 2019 at 11:02 AM Jon Gilmore notifications@github.com wrote:

Also, I do not have a connect bridge. But as far as HA is concerned, we only talk to the Main repeater, and never the connect bridge, right?

Right, that's correct. I'm just trying to figure out why only 2 people are facing this issue while others continue without issue. I imagine something has to be different/unique with their setups...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/home-assistant/home-assistant/issues/20348?email_source=notifications&email_token=ACQARWUCLPQBL5EYY7AFB2LQFGFETA5CNFSM4GR2ZVYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RE7TQ#issuecomment-522342350, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQARWW56OBOBRJVULOQOKDQFGFETANCNFSM4GR2ZVYA .

grantalewis commented 5 years ago

Noticed this tonight in the logs. Not sure if it's relevant.

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 96, in state_change_listener
    event.data.get("new_state"),
  File "/usr/src/homeassistant/homeassistant/core.py", line 372, in async_run_job
    target(*args)
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 171, in state_for_cancel_listener
    if not async_check_same_func(entity, from_state, to_state):
  File "/usr/src/homeassistant/homeassistant/components/automation/state.py", line 115, in <lambda>
    lambda _, _2, to_state: to_state.state == to_s.state,
AttributeError: 'NoneType' object has no attribute 'state'
2019-08-19 20:13:53 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <function async_track_state_change.<locals>.state_change_listener at 0x6e214468>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 96, in state_change_listener
    event.data.get("new_state"),
  File "/usr/src/homeassistant/homeassistant/core.py", line 372, in async_run_job
    target(*args)
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 171, in state_for_cancel_listener
    if not async_check_same_func(entity, from_state, to_state):
  File "/usr/src/homeassistant/homeassistant/components/automation/state.py", line 115, in <lambda>
    lambda _, _2, to_state: to_state.state == to_s.state,
AttributeError: 'NoneType' object has no attribute 'state'
2019-08-19 20:13:53 ERROR (MainThread) [homeassistant.core] Error doing job: Exception in callback <function async_track_state_change.<locals>.state_change_listener at 0x6e688738>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 68, in uvloop.loop.Handle._run
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 96, in state_change_listener
    event.data.get("new_state"),
  File "/usr/src/homeassistant/homeassistant/core.py", line 372, in async_run_job
    target(*args)
  File "/usr/src/homeassistant/homeassistant/helpers/event.py", line 171, in state_for_cancel_listener
    if not async_check_same_func(entity, from_state, to_state):
  File "/usr/src/homeassistant/homeassistant/components/automation/state.py", line 115, in <lambda>
    lambda _, _2, to_state: to_state.state == to_s.state,
AttributeError: 'NoneType' object has no attribute 'state'
2019-08-19 20:14:49 WARNING (MainThread) [homeassistant.core] Unable to remove unknown listener <function async_track_point_in_utc_time.<locals>.point_in_time_listener at 0x6de17bb8>
Tangston311 commented 5 years ago

@grantalewis , just checking to see how creating the unique HA username is working for you. I've been running for 3 days now without the issue....just curious if you're experiencing the same?

JonGilmore commented 5 years ago

@grantalewis , just checking to see how creating the unique HA username is working for you. I've been running for 3 days now without the issue....just curious if you're experiencing the same?

wow, this is great news! Please keep us posted. I'd love it if this was the cause...

thecynic commented 5 years ago

@grantalewis , just checking to see how creating the unique HA username is working for you. I've been running for 3 days now without the issue....just curious if you're experiencing the same?

wow, this is great news! Please keep us posted. I'd love it if this was the cause...

Still not sure how that would cause the symptoms. It's as if we stop getting remote updates, but it's not clear if MR stops sending or we stop reading. It'd be interesting to maybe capture some packet traces via tcpdump to see if we are receiving the remote packets. The debug logs should have shown to us that we are receiving updates. hmm.

Maybe try with: sudo tcpdump -A -ni eth0 'tcp and ip host IP_OF_MAIN_REPEATER' while trying to flip the toggle in HASS.

If you can provide those logs, it'll be pretty clear if we are even trying to send/receive data or not.

Tangston311 commented 5 years ago

@thecynic , @JonGilmore , well yesterday the issue appeared again, although this time slightly differently: my automatons with Lutron service calls didn't work either (until I restarted HA). So it worked for about 4 days, then I noticed the issue re-appear from my phone, and when I got home I noticed none of my Lutron automations had worked either. I pulled the logs, attached, but the recorder was set to "error". On pg. 237 you can see Lutron start appearing a bunch. It's possible this was a separate issue because of the different symptoms, but thought I'd post.

I'm happy to try that command - do I just run it in terminal via ssh and then flip the toggle in the Hassio interface?

Aug22904pmlogs.docx

JonGilmore commented 5 years ago

Just by giving this a once over, it looks like you're having network issues. Lutron isn't mentioned until the very end, but it looks like alarm.com is having connectivity issues, as well as darksky and lutron... total guess, it's hard to tell whats going on with this.

grantalewis commented 5 years ago

@grantalewis , just checking to see how creating the unique HA username is working for you. I've been running for 3 days now without the issue....just curious if you're experiencing the same?

Sorry for the silence -- really busy work week. I've been using unique HA credentials for a couple of weeks now. No improvement, unfortunately.

I'm going to try to capture more logs this morning as @thecynic suggests above. Will report back.