Closed dethpickle closed 5 years ago
Interesting. I was seeing cores then I enabled MQTT and they stopped. I disabled MQTT and they came back. I'm on a fork of the code so I'm not 100% sure I'm seeing the same crash.
https://github.com/ballle98/AqualinkD/issues/20
You can follow these instructions to enable core files https://pve.proxmox.com/wiki/Enable_Core_Dump_systemd
rebuild the code with debug symbols, and turn on address sanitize (we're looking for memory corruption)
make DBG="-g -O0 -fsanitize=address -static-libasan" make install
when it crashes "sudo -i" to become root, run gbd and dump the back trace with the bt command cd /var/lib/coredumps/ ls gdb /home/pi/git/AqualinkD/release/aqualinkd core-aqualinkd-sig11-user0-group0-pid386-time1554347133 bt
What Version of AqualinkD are you running? When fixing the MQTT / Mosquitto 1.6 issue, I posted a bad version that would core dump. If your using v1.2.6e, please update of downgrade to a previous version.
I'm on v1.2.6f, from when you fixed MQTT. Thanks for that, btw. I'm still up from the the restart on the 28th. I'll still keep an eye on it.
Just had the status=11/SEGV happen this evening. I am not sure a specific command was being sent at the time as I was away from the house. Running 1.3.1
Jun 4 19:42:24 aquad systemd[1]: aqualinkd.service: Main process exited, code=killed, status=11/SEGV Jun 4 19:42:24 aquad systemd[1]: aqualinkd.service: Unit entered failed state. Jun 4 19:42:24 aquad systemd[1]: aqualinkd.service: Failed with result 'signal'. Jun 4 20:17:01 aquad CRON[13258]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
I just posted 1.3.2, I fixed a few places where there was a potential for a core dump.
Got it. Updated and I'll check it again in three days. Seemed to be dying about every two days. Thank you!
Just had another crash. 1.3.2 . I think my child may have turned on the slide and waterfall at approximately the time it crashed. However, not 100% sure.
Jun 7 15:01:33 aquad aqualinkd: Serial packet too large Jun 7 15:01:33 aquad aqualinkd: Bad receive packet | HEX: 0x10|0x02|0x33|0x71|0x06|0x01|0x02|0x03|0x04|0x05|0x06|0x00|0x07|0x41|0x6c|0x6c|0x20|0x4f|0x46|0x46|0x00|0x0a|0x55|0x70|0x70|0x65|0x72|0x20|0x50|0x6f|0x6f|0x6c|0x00|0x0a|0x4c|0x6f|0x77|0x65|0x72|0x20|0x50|0x6f|0x6f|0x6c|0x00|0x0a|0x57|0x61|0x74|0x65|0x72|0x66|0x61|0x6c|0x6c|0x73|0x00|0x05|0x50|0x61|0x72|0x74|0x79|0x00| Jun 7 15:01:33 aquad systemd[1]: aqualinkd.service: Main process exited, code=killed, status=11/SEGV Jun 7 15:01:33 aquad systemd[1]: aqualinkd.service: Unit entered failed state. Jun 7 15:01:33 aquad systemd[1]: aqualinkd.service: Failed with result 'signal'.
This was very helpful, I think I've located the problem. Wait for 1.3.2b to be posted within the hour.
Just out of interest, can you list what you have on your RS485 bus? I ask as you can have Jandy and Pentair working on the same bus as the protocols are somewhat compatible, but what I'm seeing there is either another protocol I'm not familiar with or you have some other form of errors on your bus. (ie something introducing noise, like a bad termination). It's not a big issue for you unless you are noticing a lot of "press the button twice as the first time it didn't work" kind-a things.
Hayward Goldline running AL-5 emulation for Jandy AquaPure. Pentair VS Pump iAqualink 2.0 Aqualink RS control. RPI running AqualinkD
Not seeing too many errors like I did before I cleaned up the wires. However, I just got another crash with similar error. Jun 7 15:20:06 aquad aqualinkd: Bad receive packet | HEX: 0x10|0x02|0x33|0x72|0x08|0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x20|0x01|0x01|0x00|0x00|0x09|0x57|0x61|0x74|0x65|0x72|0x66|0x61|0x6c|0x6c|0x00|0x01|0x00|0x00|0x05|0x4c|0x69|0x67|0x68|0x74|0x00|0x01|0x00|0x00|0x09|0x53|0x70|0x69|0x6c|0x6c|0x6f|0x76|0x65|0x72|0x01|0x01|0x00|0x00|0x05|0x53|0x6c|0x69|0x64|0x65|0x00|0x01|0x00|
I am testing some stuff with HomeAssistant, but that is with MQTT and I doubt it is related at this point. I will wait for your update.
That crash is identical to the other. (I hope the 1.3.2b fixes that, I've posted the update now).
The new version can also read the Pentair protocol (in very early beta), so if it works for you, you can also add read_pentair_packets=yes
to your config and make sure read_all_devices = yes
is also their, and it should report RPM and WATTS for the pump to the Web UI and MQTT.
But there is nothing their that produces a protocol AqualinkD doesn't read, (that I know of). But none if this stuff is documented, it's all reversed engineered from myself and others posting info on the net.
I've been up and running on v1.3.2. (don't see a b
on it) since Jun 05 15:38:25 without a restart. This is right about the time it would be doing it. 2-3 days. I'll still keep an eye on it.
I've had no restarts at all since upgrading to v1.3.2.
Just had one (same 11/SEGV error) but it was after 14 days. I've noticed you have a 1.3.3 and have upgraded to that. I'll keep you posted.
Since upgrade to 1.3.3, the service has not crashed in 20 days.
That's great news. Glad it seems to be stable for you now.
Just recently off of issue #57. I'm not sure this could be related, but since putting on that update I've had the aqualinkd service quit on me twice, right when a new command is sent to it. In each case, it had been running successfully for several days and then SEGV'd when a command came in.
I see closed issue #52 looks identical. I'm going to try enabling
Restart=on-failure
which I currently have commented out. Just adding my observation of the same issue so that it's not considered a one-off.First time I noticed - I'm turning on a pump at 10am:
I restarted it and it again ran for a few days - the daytime low-speed shutoff is at 14:00:
Previously, I had extremely rare shutdowns of the aqualinkd service. I "handled" them simply by having the systemctl that runs the service have a restart in it. My aqualinkd.service file:
Any suggestions or places I could gather a better picture/log to submit, please let me know.