home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.42k stars 30.31k forks source link

Stopping a script using the 'stop' action with 'error' set to true doesn't stop the execution of the parent script #104130

Open LucasCZE opened 10 months ago

LucasCZE commented 10 months ago

The problem

For the demonstration, I have a script called 'Test - Stop'. This script contains the 'stop' action with the 'error' option set to true. This script always stops and indicates we are stopping because of an unexpected error.

Then I have another script called 'Test - Sequence'. This script calls the first script and because the first script always encounters an error, the next action should not be executed. The next action is calling of a third script called 'Test - Notification' just to display a notification.

I think running the 'Test - Sequence' script should only show one notification. In reality, however, 2 notifications are always displayed.

The first notification is 'The Test Stop script executed. This notification should show up.' This is perfectly fine.

The second notification is 'The Test Notification script executed. This script should not be called at all.' This notification should not be displayed.

At least that's how I understand the documentation: https://www.home-assistant.io/docs/scripts/#stopping-a-script-sequence

What version of Home Assistant Core has the issue?

2023.11.2

What was the last working version of Home Assistant Core?

It doesn't work for a long time. It's possible that it never worked.

What type of installation are you running?

Home Assistant Container

Integration causing the issue

Script

Link to integration documentation on our website

https://www.home-assistant.io/integrations/script/

Diagnostics information

No response

Example YAML snippet

test_sequence:
  alias: Test - Sequence
  sequence:
    - service: script.test_stop
      data: {}
    - service: script.test_notification
      data: {}
  mode: single

test_stop:
  alias: Test - Stop
  sequence:
    - service: notify.persistent_notification
      data:
        message: The Test Stop script executed. This notification should show up.
    - stop: Error!
      error: true
    - service: notify.persistent_notification
      data:
        message: The Test Stop script after stop with error set to true. This should not be executed.
  mode: single

test_notification:
  alias: Test - Notification
  sequence:
    - service: notify.persistent_notification
      data:
        message: The Test Notification script executed. This script should not be called at all.
  mode: single

Anything in the logs that might be useful for us?

No response

Additional information

No response

lucaspeixotot commented 10 months ago

Hi @LucasCZE, I just reproduced the issue and I'll try to resolve it.

As far as I understand, whenever you use the field error: true the script should return an error, and if it is being used by another script, the outside script should stop as well.

Regarding the error option the documentation says:

There is also an error option, to indicate we are stopping because of an unexpected error. It stops the sequence as well, but marks the automation or script as failed to run

Regarding continuing (or not) when we face an error we have:

By default, a sequence of actions will be halted when one of the actions in that sequence encounters an error. The automation or script will be halted, an error will be logged, and the automation or script run is marked as errored.

My only question here is: when the documentation says but marks the automation or script as failed to run, is this "as failed" the same thing as "an error" in that sequence encounters an error?

Let me investigate if we have a kind of documentation issue that it's not clear and it's causing confusion or if there is something to fix in the code.

lucaspeixotot commented 10 months ago

@LucasCZE Looking at the code it seems to be an issue for me. Checking with other devs just to understand a few things.

lucaspeixotot commented 10 months ago

Below we can see the execution of the stop step and what happens whenever the error option is true. In that case, we are raising an exception named _AbortScript. This exception is not exclusively for the stop step, other steps can be used whenever an abort makes sense.

  async def _async_stop_step(self):
      """Stop script execution."""
      stop = self._action[CONF_STOP]
      error = self._action.get(CONF_ERROR, False)
      trace_set_result(stop=stop, error=error)
      if error:
          self._log("Error script sequence: %s", stop)
          raise _AbortScript(stop)

Below we can find what is happening whenever we have a _AbortScript exception by any script's step. Note that the exception is only thrown up (what would cause the behavior excepted by the used) when the script being executed is not a top-level one. I'm not sure what this "top-level" means.

try:
    self._log("Running %s", self._script.running_description)
    for self._step, self._action in enumerate(self._script.sequence):
        if self._stop.is_set():
            script_execution_set("cancelled")
            break
        await self._async_step(log_exceptions=False)
    else:
        script_execution_set("finished")
except _AbortScript:
    script_execution_set("aborted")
    # Let the _AbortScript bubble up if this is a sub-script
    if not self._script.top_level:
        raise
  1. As we have a script that is calling a second one (where the stop step with error option enabled) that is failing with an error, it should be a top-level one. Maybe the issue is because the internal script is not being evaluated as a non-top-level, then the exception is not being thrown up.
  2. If I didn't understand the top-level meaning and this variable is with the expected value, then the logic to bubble up the exception in the case where the _AbortScript is coming from the stop step is wrong and we need to fix it. There are two immediate ideas:
    • Keep using the _AbortScript but create a way to;
    • Create a specific exception for the stop step case;
    • Use the default Python exception once we have already thrown up its exceptions (this could cause some generic information when the source of the issue is not generic and unknown)

I'm discussing with other devs to understand more the issue and choose the best approach to fix it.

LucasCZE commented 10 months ago

@lucaspeixotot First of all, thank you for your attention to this issue.

My only question here is: when the documentation says but marks the automation or script as failed to run, is this "as failed" the same thing as "an error" in that sequence encounters an error?

In my opinion it must mean the same thing. Nothing else makes sense to me.

Let me investigate if we have a kind of documentation issue that it's not clear and it's causing confusion or if there is something to fix in the code.

As you yourself wrote in the next comment, I think the documentation is fine, but the actual behavior is wrong.

I think you are right about the top_level property. Logically, a nested script should not be marked as top level. I think this is the whole problem.

Thanks again for investigating this issue.

lucaspeixotot commented 10 months ago

The top_level concept is not what we were thinking. It's a concept used by the repeat action. All the actions inside of the repeat action are understood as non-top-level. So this has nothing with the issue described on this issue and cannot be leveraged.

lucaspeixotot commented 10 months ago

However @LucasCZE , I found a way to fix the issue and I created a thread for people to review it before creating the Pull Request and going further with some tests.

LucasCZE commented 10 months ago

@lucaspeixotot Sounds great. Thank you!

home-assistant[bot] commented 10 months ago

Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (script) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `script` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign script` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


script documentation script source (message by IssueLinks)

HNKNTA commented 8 months ago

I guess I've just faced the similar issue, I've created a script to use in automations (like a function to call), to avoid code duplication. In some conditions, the script does stop with error set to true. However, the automation still keeps running after this, ignoring the abort stoppage.

IetIesAai commented 7 months ago

I seem to have also run into this problem, in this case with three "layers": automation-->script1-->script2. when script 2 errors out, I expected it to at least stop script1, but this does not seem to happen. Not sure if it should also stop the automation triggering script 1, but it doesn't even dot he first "stop". @lucaspeixotot - you mentioned discussing a fix somewhere in a thread (why not here?) - but I can't seem t o find it anywhere?

srescio commented 5 months ago

plus one, was reading the difference between the two different way to run scripts but never managed to get it to work the way its documented https://www.home-assistant.io/integrations/script/#waiting-for-script-to-complete

LucasCZE commented 4 months ago

I just tested it in version 2024.5.5, the issue still exists.

LucasCZE commented 3 months ago

Version 2024.6.3, the bug is still reproducible.

LucasCZE commented 2 months ago

Version 2024.7.4, the bug is still present.

traysh commented 3 weeks ago

2024.9.2, still here. We're approaching the bug's birthday! 🎉

traysh commented 3 weeks ago

I suppose this is the fix @lucaspeixotot mentioned: https://github.com/home-assistant/core/compare/dev...lucaspeixotot:home-assistant-core:104130-stop-step-issue