Closed derjoerg closed 4 months ago
Hey there @bbernhard, mind taking a look at this issue as it has been labeled with an integration (signal_messenger
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
signal_messenger documentation signal_messenger source (message by IssueLinks)
I saw similar issues reported here and also in the forum, but they were all related to zha or zwave.
Additionally someone mentioned a change to a timeout parameter for service-calls, but I don't know if this is related.
I just add as much info as I can gather. Hopefully it helps: I also have some alerts setup, which send out notifications over signal_messenger and they are working as expected Adding "continue_on_error" to the service-call doesn't help
A possible workaround would be to create a script, which calls the signal_messenger notify and start this script from an automation with script.turn_on. Then the automations do not hang.
But it is only a workaround and not the source of the issue
I tested the above mentioned workaround over the weekend, but this is also not working. It is true that the automation do not hang, when such a script is called, but the script itself never comes to an end and it looks like it fills up an internal queue and after a specific amount of "running" scripts HA will become unresponsive.
Additionally I tested the same now with telegram and this is working as expected.
I have to the exact same thing. Are you also using mode: json-rpc
in the signal addon? Not sure if this would make a difference but I am going to try mode: native
Although this sounds more like something related to HA core stuff while reading the other issues. I tried to debug some put couldn't really put a finger on the issue...
I have to the exact same thing. Are you also using
mode: json-rpc
in the signal addon? Not sure if this would make a difference but I am going to trymode: native
Hi @renini , thanks for this tip. I changed the mode now also from "json-rpc" to "native" and till now it looks promising. I will keep an eye on it the next days.
I have the exact same problem. Automations stuck in 'still running' state after sending a signal message. I just changed to native mode and will test over the next little while.
what is the status on this issue? - just started noticing that my automations containing signal notifications get stuck.
did native mode "fix" this for you? how can we get a real fix going?
Hi @GatoPharaoh, after changing the add-on config to "native" mode the automations are running normally now and I had no single "stuck".
Regarding a possible fix ... As you see in the comments above, no single reaction from the code owner :(
If the automations work after switching to the native mode, then the json-rpc mode is/was to blame here.
Do you use the Home Assistant addon or the signal-cli-rest-api container?
Most of the time it is either the signal-cli-rest-api addon/docker container which is to blame here. (The Signal service changes quite often and sometimes they introduce changes which make the addon/docker container break).
There were some problems with the json-rpc mode inthe past, but with the latest rework it should be pretty stable.
Hi, Thanks for the feedback. I'm using the HA-addon in version 0.80.0.
And it still hangs in jspn-rpc mode but not in native mode? If so, please try to use curl
to send a message. Does it also hang there?
just noticed I was still on 0.7 - just updated and will test if this fixes the problem
as for the issue itself - I tried using curl there were no issues with sending messages
in fact, home assistant itself was sending the message inside an automation without a problem - but just ONCE. after home assistant succesfully sent the automated message, the automation got stuck and said "still running" - implying it was waiting for some kind of response from signal_messenger that it never got. having those automations in single mode means that automation won't run again, until "still running" gets "unstuck"
reloading the automations or restarting home assistant got the automations unstuck. i also just noticed that they got "unstuck" due to me updating the signal-cli-rest-api container which home assistant saw as an error (see picutre below). the message in the picutre got sent and the automation was stuck from jan 22 until now (due to me updating the container)
hope this helps.
also, see petros comment on the forums - it looks like they know where the problem is - but the integrations have to implement a fix
https://community.home-assistant.io/t/2023-8-update-breaks-automations/598675/221
Do you see the same behavior as @derjoerg that changing the mode to native
fixes the issue? Because actually there shouldn't be any differences between the json-rpc
mode and the native
mode - at least regarding the outward facing interface. They should behave exactly the same in that regard.
also, see petros comment on the forums - it looks like they know where the problem is - but the integrations have to implement a fix
I am not sure if I completely understand what they were discussing there. But to me it seems that the "automation gets stuck issue" should only happen when the service takes more than 10 seconds to respond - which shouldn't happen with neither the json-rpc
nor the native
mode.
The only thing I can think of where the request might take longer than 10 seconds is, if you restart the signal-cli-rest-api
docker container/addon and immediately afterwards try to send a message. If you run the signal-cli-rest-api
service in json-rpc
mode, the java application is used instead of the natively compiled binary (which is used in the native
mode). The java application takes quite a while until it starts up and could easily exceed the 10 seconds max timeout (especially on systems like the raspberry pi).
In order to try that, you could do the following:
signal-cli-rest-api
docker container in json-rpc
modeDo you see the same behavior as @derjoerg that changing the mode to
native
fixes the issue? Because actually there shouldn't be any differences between thejson-rpc
mode and thenative
mode - at least regarding the outward facing interface. They should behave exactly the same in that regard.
i have not tried native mode yet - thing is, I can't even reproduce it consistently. it just happens at some point. i have updated the container to 0.80 and will test over the coming days if the issue returns.
I am not sure if I completely understand what they were discussing there. But to me it seems that the "automation gets stuck issue" should only happen when the service takes more than 10 seconds to respond - which shouldn't happen with neither the
json-rpc
nor thenative
mode.
yeah, it's a little weird how the whole discussion around the "still running" issue is handled - after some searching i found the changes to the zwave integration they mentioned that was affected as well - don't know if this gives you some insight: https://github.com/home-assistant-libs/zwave-js-server-python/pull/758/commits
The only thing I can think of where the request might take longer than 10 seconds is, if you restart the
signal-cli-rest-api
docker container/addon and immediately afterwards try to send a message. If you run thesignal-cli-rest-api
service injson-rpc
mode, the java application is used instead of the natively compiled binary (which is used in thenative
mode). The java application takes quite a while until it starts up and could easily exceed the 10 seconds max timeout (especially on systems like the raspberry pi).
thing is, my signal-cli-rest-api
docker has been running for 3 months straight - i never restarted it since the last update. also, even though automations get stuck when using signal_messenger
, sending a message via the services tab in home assistant still works without issues. other automations with signal_messenger
work as well - until they just get stuck for unknown reasons. so the container itself seems to work as expected.
i'll keep an eye on my automations in the coming days and will report back. if the issue returns, i'll try native mode as well.
Really strange. Unfortunately I do not know anything about how Home Assistant automations are implemented. So it is kinda hard to figure out what needs to be changed in order to fix the issue in the integration :/.
I also had a quick look at the zwave diff, but couldn't spot anything useful. If anyone has some additional knowledge, please let me know.
well, it happened again - so container-version 0.80 on json-rpc mode still has the issue. just checked the logs of the container, but doesn't look like it has much to offer
in /var/log/signal-cli-json-rpc-1/out.log
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN SignalWebSocketHealthMonitor - Missed keep alives, identified last: 1706005190976 unidentified last: 1706005190976 needed by: 1706005220840
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ManagerImpl - No profile name set. When sending a message it's recommended to set a profile name with the updateProfile command. This may become mandatory in the future.
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
WARN ReceiveHelper - Connection closed unexpectedly, reconnecting in 100 ms
in /var/log/supervisor/supervisord.log
- last entry here was before the issue reappeared
2024-01-25 19:55:13,025 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2024-01-25 19:55:13,026 WARN For [program:signal-cli-json-rpc-1], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-01-25 19:55:13,026 INFO Included extra file "/etc/supervisor/conf.d/signal-cli-json-rpc-1.conf" during parsing
2024-01-25 19:55:13,335 INFO RPC interface 'supervisor' initialized
2024-01-25 19:55:13,335 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-01-25 19:55:13,338 INFO daemonizing the supervisord process
2024-01-25 19:55:13,339 INFO supervisord started with pid 32
2024-01-25 19:55:13,682 INFO spawned: 'signal-cli-json-rpc-1' with pid 39
2024-01-25 19:55:14,692 INFO success: signal-cli-json-rpc-1 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
apart from changing to native mode - anything you want me to try so we can find the cause of the issue?
mhhh - just tried sending a message over curl after it got stuck in home assistant - the curl message is sent and received successfully - but the container doesn't return any response - so the terminal gets "stuck" until i kill it with SIGINT. other curl calls don't repsond either, as i just tried to update the profile name via curl.
after restarting the signal-cli-rest-api
container and sending a curl call I get a response with a timestamp. so it does look like the container gets kind of stuck at some point.
response after restarting the container:
Really strange. If not already done, please try to set a profile name via the Update Profile PUT request.
I think we need to find out, whether the REST API wrapper or the signal-cli process gets stuck. Could you please try to change the log level to debug with curl -X POST -H "Content-Type: application/json" -d '{"logging": {"level": "debug"}}' 'http://127.0.0.1:8080/v1/configuration'
and then post the (obfuscated!) log output from the docker container log here. Hopefully that gives us more information.
Please make sure to not restart the docker container after you've set the log level to debug. as the log level setting isn't persistent. So every restart of the docker container resets the log level again to its default state.
please try to set a profile name via the Update Profile PUT request.
done that after i saw it in the logs today - entries stopped appearing since
I think we need to find out, whether the REST API wrapper or the signal-cli process gets stuck. Could you please try to change the log level to debug with
curl -X POST -H "Content-Type: application/json" -d '{"logging": {"level": "debug"}}' 'http://127.0.0.1:8080/v1/configuration'
and then post the (obfuscated!) log output from the docker container log here. Hopefully that gives us more information.
got it - enabled debugging - will report back
5 days and more than a dozen messages later and not a single automation got stuck since. signal-cli-rest-api
logs are clean as well. wondering if it really just was the profile name that needed to be set? or just a coincidence?
@bbernhard after setting the log level to debug, there hasn't been a single entry to the logs. is this expected behavior? or should the log be full of entries even when just sending a message successfully?
It could be the profile name. I think those warning messages are also added to the JSON payload that is returned from signal-cli. Maybe there is a case where we do not parse those messages properly and we then return a JSON payload that can't be handled properly by the HA integration and then the integration gets stuck and waits for a response forever.
There should be a lot of messages written. However, those log entries aren't written to the log files you've mentioned above. They are written to the the docker-compose log file. Maybe you've disabled logging in tje docker-composr.yml file?
my bad - forogt about the docker-compose log file. just looked into the log. only thing that constantly reapperas is the following
{"log":"time=\"2024-01-29T12:16:08Z\" level=debug msg=\"Couldn't send message to golang channel, as there's no receiver\"\n","stream":"stderr","time":"2024-01-29T12:16:08.620433913Z"}
that message appears after every message that is sent. sometimes even four times in a row
{"log":"time=\"2024-01-29T12:16:21Z\" level=debug msg=\"Couldn't send message to golang channel, as there's no receiver\"\n","stream":"stderr","time":"2024-01-29T12:16:21.129796078Z"}
{"log":"time=\"2024-01-29T12:16:21Z\" level=debug msg=\"Couldn't send message to golang channel, as there's no receiver\"\n","stream":"stderr","time":"2024-01-29T12:16:21.677083043Z"}
{"log":"time=\"2024-01-29T12:16:22Z\" level=debug msg=\"Couldn't send message to golang channel, as there's no receiver\"\n","stream":"stderr","time":"2024-01-29T12:16:22.155938329Z"}
{"log":"time=\"2024-01-29T12:16:25Z\" level=debug msg=\"Couldn't send message to golang channel, as there's no receiver\"\n","stream":"stderr","time":"2024-01-29T12:16:25.065024662Z"}
but, as i said, issue hasn't come up since. will continue to monitor.
I guees you do not have a websocket connection established to the receive
endpoint? In this case it is fine - it is just some verbose debug logging to troubleshoot websocket issues with the receive
endpoint.
ah okay - i run a small websocket script to hand over any received messages to home assistant. but haven't had any issues with that.
after testing for more than 2 weeks, the issue has not come back. only thing I did was setting the profile name.
Thanks for the feedback - glad it is working now!
Just as an additional data point - I've been following this issue with the same problem and the 'set profile' resolution also worked for me.
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
The problem
Hi, after upgrading to from 2023.10 to 2023.11 I have the problem that all my automations, which use signal_messenger are stuck in "still running" mode after the first run. I need to manually deactivate and activate the automation to keep it working, but only for one run. When I change the service-call from signal_messenger to e.g. notify.mobile_app the automations are working as expected.
Below the trace of the first run of the automation, when the signal message is going through 01_away.json
and here the trace of a second run o the automation, when nothing is working anymore 02_return.json
What else can I provide to help on troubleshooting that?
What version of Home Assistant Core has the issue?
core-2023.11.1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Signal Messenger
Link to integration documentation on our website
https://www.home-assistant.io/integrations/signal_messenger/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response