Handle TooManyExecutions when more than 10 executions are running

iMicknl commented 4 years ago

Describe the bug Originally reported on Home Assistant Community

Not all executions will succeed, when more than 10 executions are running at the same time.

To Reproduce Steps to reproduce the behavior:

Create a scene or automation
Add more than 10 long running actions in a sequence (like movement of covers)

Expected behavior All executions running in a sequence, with a pause.

Environment (please complete the following information):

ha-tahoma version: master
Home Assistant version: -
Platform: -

Additional context

Protokolldetails ( ERROR )
Logger: homeassistant.components.automation.rolladen_am_wochenende_feiertag_morgens_offen
Source: custom_components/tahoma/tahoma_device.py:155
Integration: Automatisierung (documentation,issues)
First occurred: 12. September 2020, 8:00:00 (8occurrences)
Last logged: 13. September 2020, 8:37:41

Rolladen am Wochenende/Feiertag Morgens öffen: Error executing script. Unexpected error for scene at pos 1: Server busy, please try again later. (Too many executions)
While executing automation automation.rolladen_am_wochenende_feiertag_morgens_offen
Rolladen am Wochenende/Feiertag Morgens öffen: Error executing script. Unexpected error for scene at pos 1: Execution queue is full on gateway: #xxxx-xxxx-xxxx (soft limit: 10)
Traceback (most recent call last): File “/usr/src/homeassistant/homeassistant/helpers/script.py”, line 191, in _async_step await getattr( File “/usr/src/homeassistant/homeassistant/helpers/script.py”, line 400, in _async_scene_step await self._hass.services.async_call( File “/usr/src/homeassistant/homeassistant/core.py”, line 1308, in async_call task.result() File “/usr/src/homeassistant/homeassistant/core.py”, line 1343, in _execute_service await handler.func(service_call) File “/usr/src/homeassistant/homeassistant/helpers/entity_component.py”, line 208, in handle_service await self.hass.helpers.service.entity_service_call( File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 454, in entity_service_call future.result() # pop exception if have File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 583, in async_request_call await coro File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 485, in _handle_entity_call await result File “/usr/src/homeassistant/homeassistant/components/homeassistant/scene.py”, line 311, in async_activate await async_reproduce_state( File “/usr/src/homeassistant/homeassistant/helpers/state.py”, line 106, in async_reproduce_state await asyncio.gather( File “/usr/src/homeassistant/homeassistant/helpers/state.py”, line 100, in worker await platform.async_reproduce_states( # type: ignore File “/usr/src/homeassistant/homeassistant/components/cover/reproduce_state.py”, line 125, in async_reproduce_states await asyncio.gather( File “/usr/src/homeassistant/homeassistant/components/cover/reproduce_state.py”, line 86, in _async_reproduce_state await hass.services.async_call( File “/usr/src/homeassistant/homeassistant/core.py”, line 1308, in async_call task.result() File “/usr/src/homeassistant/homeassistant/core.py”, line 1343, in _execute_service await handler.func(service_call) File “/usr/src/homeassistant/homeassistant/helpers/entity_component.py”, line 208, in handle_service await self.hass.helpers.service.entity_service_call( File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 454, in entity_service_call future.result() # pop exception if have File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 583, in async_request_call await coro File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 485, in _handle_entity_call await result File “/config/custom_components/tahoma/cover.py”, line 173, in async_set_cover_position await self.async_execute_command(command, position) File “/config/custom_components/tahoma/tahoma_device.py”, line 155, in async_execute_command exec_id = await self.coordinator.client.execute_command( File “/usr/local/lib/python3.8/site-packages/backoff/_async.py”, line 133, in retry ret = await target(*args, **kwargs) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 185, in execute_command return await self.execute_commands(device_url, [command], label) File “/usr/local/lib/python3.8/site-packages/backoff/_async.py”, line 133, in retry ret = await target(*args, **kwargs) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 208, in execute_commands response = await self.__post(“exec/apply”, payload) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 240, in __post await self.check_response(response) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 278, in check_response raise Exception(message if message else result) Exception: Server busy, please try again later. (Too many executions)

iMicknl commented 4 years ago

@tetienne, @vlebourl what do you think:

We could tackle this in https://github.com/iMicknl/python-tahoma-api with backoff, where we add a debounce time of x seconds with a few retries max.
We could add retry logic in this integration based on the number of executions in the list. If the current number is higher than 10, we could delay the call until the list of running executions is lower than 10.

Any other suggestions?

iMicknl commented 4 years ago

I am currently pursuing option 1 via https://github.com/iMicknl/python-tahoma-api/pull/48. This is the easiest to implement, however it has the following limitations:

After 120 seconds (10 retries) it will still throw the error. This integration would need to do something with it.
If multiple commands for the same device are sent to the API when facing the TooManyExecutions exception, it could happen that they are executed out of sync. When both of them will be retried via backoff, it could happen that the last one is lucky and will be executed first.

kaczkalolo commented 3 years ago

I have 14 covers. What can i do to close/open them all at the same time? i getting exactly the same error :(

iMicknl commented 3 years ago

@kaczkalolo do you encounter this problem for a while, or is this a new issue? There was another report yesterday (https://github.com/iMicknl/ha-tahoma/issues/378).

For now, I would advice to add a delay in your automations in Home Assistant... This will be way more reliable, since this is an issue on the Somfy TaHoma side.

kaczkalolo commented 3 years ago

I had this for a while but never really had time to debug but finally found it. I don't use automation for that- only group.

mstuij commented 3 years ago

Hi,

I have the same problem but for me there are no 10 long running operations necessary.

At 06:00 rollers at the front-side of the house are openend. This goes always good without errors and always exactly at 06:00.

At 07:00 my rollers at the side of the house are planned to open. This goes almost every time wrong. I get this error:

2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.6s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 1.6s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 3.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.4s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:01 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.2s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:01 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.4s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 11.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.0s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.2s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 14.4s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:04 INFO (MainThread) [backoff] Backing off execute_command(...) for 5.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:10 INFO (MainThread) [backoff] Backing off execute_command(...) for 3.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:13 INFO (MainThread) [backoff] Backing off execute_command(...) for 6.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:14 INFO (MainThread) [backoff] Backing off execute_command(...) for 8.8s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:17 INFO (MainThread) [backoff] Backing off execute_command(...) for 27.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:19 INFO (MainThread) [backoff] Backing off execute_command(...) for 13.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:23 INFO (MainThread) [backoff] Backing off execute_command(...) for 4.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:27 INFO (MainThread) [backoff] Backing off execute_command(...) for 80.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:30 DEBUG (MainThread) [custom_components.tahoma] Finished fetching device events data in 0.037 seconds 2021-03-17 07:00:33 INFO (MainThread) [backoff] Backing off execute_command(...) for 116.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:44 INFO (MainThread) [backoff] Backing off execute_command(...) for 3.2s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:47 INFO (MainThread) [backoff] Backing off execute_command(...) for 76.8s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:01:00 DEBUG (MainThread) [custom_components.tahoma] Finished fetching device events data in 0.111 seconds 2021-03-17 07:01:30 DEBUG (MainThread) [custom_components.tahoma.coordinator] EventName.DEVICE_STATE_CHANGED/None (device: io://0203-9950-4411/11186955, state: None -> None) 2021-03-17 07:01:30 DEBUG (MainThread) [custom_components.tahoma] Finished fetching device events data in 0.132 seconds

The rollers that are opened at 06:00 is done by Node-RED by openening a group:

`cover:

platform: template covers: voorkant_cover_group: friendly_name: "Voorkant" device_class: shutter optimistic: false open_cover:
- service: tahoma.set_cover_position_low_speed data: entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot position: 100 close_cover:
- service: tahoma.set_cover_position_low_speed data: entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot position: 0 stop_cover:
- service: cover.stop_cover data: entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot set_cover_position:
- service: tahoma.set_cover_position_low_speed data: entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot position: "{{position}}" position_template: "{{ state_attr('cover.01_master_bedroom', 'current_position' ) | int }}" icon_template: >- {% if is_state('sensor.voorkant_cover_group', 'open') %} mdi:window-shutter-open {% else %} mdi:window-shutter {% endif %} `

The rollers that are opened at 07:00 (which has the problems) is done by Node-RED by openening a group:

`cover:

platform: template covers: westkant_cover_group: friendly_name: "West kant" device_class: shutter optimistic: false open_cover:
- service: tahoma.set_cover_position_low_speed data: entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam position: 100 close_cover:
- service: tahoma.set_cover_position_low_speed data: entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam position: 0 stop_cover:
- service: cover.stop_cover data: entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam set_cover_position:
- service: tahoma.set_cover_position_low_speed data: entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam position: "{{position}}" position_template: "{{ state_attr('cover.04_computer_kamer', 'current_position' ) | int }}" icon_template: >- {% if is_state('sensor.westkant_cover_group', 'open') %} mdi:window-shutter-open {% else %} mdi:window-shutter {% endif %}

`

The rollers are openend a minute or 2 after the scheduled time. And also they are NOT openend at the same time. So you see roller 1 is opening, after that number 2 and then number 3.

rvrignaud commented 3 years ago

Hi,

I started to see this behavior for 13 shutters a few days back. I now also have this issue:

Jul 07 10:50:49 pi hass[944]: 2021-07-07 10:50:49 INFO (MainThread) [backoff] Backing off execute_commands(...) for 0.2s (pyhoma.exceptions.NotAuthenticatedException: Not authenticated)
Jul 07 10:50:49 pi hass[944]: 2021-07-07 10:50:49 ERROR (MainThread) [custom_components.tahoma.tahoma_entity] Too many requests, try again later : login with REDACTED@gmail.com

I can't even login to https://www.tahomalink.com/ as I get the same error:

(Too many requests, try again later : login with REDACTED@gmail.com)

in a pop up message box.

I'm using Google Assistant to open close all the shutters at once. Not sure how to introduce delay on my side. Any known workaround ?

vlebourl commented 3 years ago

Hi, I don't think this error is linked to the first one. This one happens when you try to open too many connections to the API in a short period of time. If I'm not mistaken, HA keeps its connection open to reuse it for successive actions.

Waiting 10 minutes before retrying to connect is the only fix I know for that one.

rvrignaud commented 3 years ago

Hi @vlebourl. Thank you for clarification and sorry to mixed up issues. I also have tons of Execution queue is full on gateway: #xxxx-xxxx-xxxx (soft limit: 10) and I now never have all my shutters closing and opening at the same time. Even though this has worked for quite some time before. Any idea how to improve when not using an automation ?

laszlojakab commented 3 years ago

Is there a way to have an event if this kind of error happens (like in #573)? This exception catched here:

https://github.com/iMicknl/ha-tahoma/blob/47c92c6dd6d49233593f9f8b9e9dfb21facf5bca/custom_components/tahoma/executor.py#L65-L75

There is only a log message but can't handle this error from outside (eg: from an automation). More general question: can we have an event (or events) if a command fails (eg: open cover, close cover, etc...)?

tetienne commented 3 years ago

@laszlojakab Can you try the master branch? I've added event in case of failure. More information here.

laszlojakab commented 3 years ago

@tetienne Sorry not exactly the same error as in the subject of the issue. I get this error:

2021-09-30 23:03:10 ERROR (MainThread) [custom_components.tahoma.executor] Execution queue is full on gateway: #XXXX-XXXX-XXXX (soft limit: 10)

No event for this kind of error in master branch. As I mentioned this error logged by line 74 in executor.py

tetienne commented 3 years ago

@laszlojakab I was able to reproduce your error spamming a script where I open and close many times a cover.

It seems an event is well triggered when a command is not correctly send:

2021-10-04 15:34:13 DEBUG (MainThread) [custom_components.tahoma.coordinator] Event(timestamp=1633354453054, name=<EventName.EXECUTION_STATE_CHANGED: 'ExecutionStateChangedEvent'>, setupoid=741ca89f-a47b-4ad****-894d-d225c06922b1, owner_key=741ca89f-a47b-4ad****-894d-d225c06922b1, type=1, sub_type=1, time_to_next_state=0, failed_commands=None, failure_type_code=None, failure_type=None, condition_groupoid=None, placeoid=None, label=None, metadata=None, camera_id=None, deleted_raw_devices_count=None, protocol_type=None, gateway_id=None, exec_id='4b83103c-ac10-3e01-18e7-2cb65d75c706', deviceurl=None, device_states=[], old_state=<ExecutionState.INITIALIZED: 'INITIALIZED'>, new_state=<ExecutionState.NOT_TRANSMITTED: 'NOT_TRANSMITTED'>)

Or

2021-10-04 15:34:14 DEBUG (MainThread) [custom_components.tahoma.coordinator] Event(timestamp=1633354453894, name=<EventName.EXECUTION_STATE_CHANGED: 'ExecutionStateChangedEvent'>, setupoid=741ca89f-a47b-4ad****-894d-d225c06922b1, owner_key=741ca89f-a47b-4ad****-894d-d225c06922b1, type=1, sub_type=1, time_to_next_state=-1, failed_commands=[{'deviceurl': 'io://****-*****-*****/5978000', 'rank': 0, 'failure_type': 'CMDCANCELLED'}], failure_type_code=<FailureType.CMDCANCELLED: 106>, failure_type='CMDCANCELLED', condition_groupoid=None, placeoid=None, label=None, metadata=None, camera_id=None, deleted_raw_devices_count=None, protocol_type=None, gateway_id=None, exec_id='4b830acf-ac10-3e01-18e7-2cb6cde8f991', deviceurl=None, device_states=[], old_state=<ExecutionState.IN_PROGRESS: 'IN_PROGRESS'>, new_state=<ExecutionState.FAILED: 'FAILED'>)

You can see these logs if you enabled the debug log.

iMicknl commented 3 years ago

@tetienne perhaps his case will broadcast NOT_TRANSMITTED and we only create an HA event when the ExecutionState is failed?

tetienne commented 3 years ago

I’m sure if the events are linked. But, we can forward more events if needed indeed.

rvrignaud commented 3 years ago

Besides the event generated, is there any intention to have a proper fix to this issue ? This is really introducing lots of problems for me

tetienne commented 3 years ago

@rvrignaud That’s not something we can fix easily, if feasible. It means we will have to handle a kind of queue, or something similar. In your case, why not use scenario created on the tahomalink.com side?

rvrignaud commented 3 years ago

Because I'm using Google Assistant voice queries that are triggering native devices command (for 13 devices for now, soon much more) and not a specific routine. Like "OKG close all the shutters." I would prefer as much as possible to divert from native GHA interactions

iMicknl commented 3 years ago

@tetienne I wonder why the backoff is not working. Perhaps we need to retry for a longer time.

I am not a huge fan of building a queue in the Data Update Coordinator...

tetienne commented 3 years ago

@iMicknl Probably yes, we can add some delay to check if it helps.

rvrignaud commented 3 years ago

If you need any help in testing new behavior, I can consistently reproduce this one. Happy to help

laszlojakab commented 2 years ago

Any updates on that issue? Can we get an event (like the errors in overkiz.event) for this event to do some workaround with automations?

downdrown commented 2 years ago

I'm having the same issue when trying to control 15 blinds simultaneously when using the Overkiz integration. Does anybody know what would be the required delay between the calls for Overkiz to accept the requests? (I mean if i split the calls in 10 / 5) calls for example.

cesc86 commented 2 years ago

Same story here trying to control 14 Somfy blinds at once with Overkiz. All I could do was split them in 7 and 7...

laszlojakab commented 1 year ago

Hi again. The problem still exists. I know the repo is deprecated but I don't want to open a new issue in core for the same error. As I see overkiz.event is not migrated to core so I can't do any workaround in automation for the soft limit error. There is a continue_on_error flag for scripts. It would be nice if we could "catch" the soft limit error with this. With this ability after moving the cover I could check if it is in the expected position or if it is moving. If none of them is true then there was an error so I should set the cover position again. The continue_on_error works only with exceptions.HomeAssistantError type. See here: https://github.com/home-assistant/core/blob/ba8a530d19dc3e93602275961de46d700611f4a1/homeassistant/helpers/script.py#LL493C24-L493C24

Could you please modify the core version of the integartion to raise HomeAssistantError in case of soft limit error?

iMicknl commented 1 year ago

@laszlojakab would be good to raise this on core repo! Happy to see if we can implement it.

laszlojakab commented 1 year ago

Added to core.

iMicknl / ha-tahoma

Handle TooManyExecutions when more than 10 executions are running #261