nebulous / infinitude

Open control of Carrier/Bryant thermostats
MIT License
224 stars 50 forks source link

Permission error while attempting to start the listener #161

Closed jftaylorMn closed 1 year ago

jftaylorMn commented 1 year ago

This is a bare bones clean install on the 32 bit RPi Lite OS (latest). The thermostat is not set up (yet). I am following the instructions in the wiki (using ssh). The hardware is a Pi Zero 2 W, The only thing I expect it to do is respond to infrequent GET API calls from an esp32 and interact with the thermostat, so hope this will be enough power and won't trash the SD card too quickly. Scope could creep.

when running the interactive command ./infinitude daemon -m production

The logged messages appear normal in the terminal session, and the web page at port 3000 displays a web page with humidity and filter usage gauges (as I would expect).

Whenever I attempt a command that specifies a listener, I get an error (and the page is no longer available)

./infinitude daemon -m production -l http://*:80 Using /dev/ttyAMA0 serial interface Can't create listen socket: Permission denied at /usr/share/perl5/Mojo/IOLoop.pm line 130.

Possibly related... I have set up the services file and executed the command to get the service to autostart on reboot, but when I run command to see if the web service is up, I do see the web service process. ps -ef | grep 'infinitude' root 329 1 1 21:29 ? 00:00:04 perl /home/me/infinitude/infinitude daemon -l http://*:80 me 559 529 0 21:34 pts/0 00:00:00 grep --color=auto infinitude

This looks like it matches the wiki output, but I cannot get the web page to load. (connection refused), Since in this case, the web service is running as root, I assume that permissions on the directory is not the issue.

changing the services file to not specify the listener does not make any obvious difference, nor does increasing the restart delay.

I'm not sure what else to debug here. Thoughts?

jftaylorMn commented 1 year ago

@nebulous - any suggestions on how to get the service to display a web page and expose the API? I am fine with leaving the port to 3000,

nebulous commented 1 year ago
./infinitude daemon -m production -l http://*:80
Using /dev/ttyAMA0 serial interface
Can't create listen socket: Permission denied at /usr/share/perl5/Mojo/IOLoop.pm line 130.

the above looks to me like you need to run as a user with permissions to open 80. I would expect running infinitude with sudo would eliminate that error(which it seems the sudo-equivalent did below).


root 329 1 1 21:29 ? 00:00:04 perl /home/me/infinitude/infinitude daemon -l http://*:80
me 559 529 0 21:34 pts/0 00:00:00 grep --color=auto infinitude```

This looks like it matches the wiki output, but I cannot get the web page to load. (connection refused), Since in this case, the web service is running as root, I assume that permissions on the directory is not the issue.

yep that looks correct at a glance, and I would think that http://*:80 would bind to all interfaces, but you might try testing a specific ip instead of *

beyond that, I'd suggest running in development rather than production mode, which will show a bunch of debug output, and trying to hit Infinitude's ip/port with curl or telnet or nc or wget etc from the shell on the Pi.

The logged messages appear normal in the terminal session, and the web page at port 3000 displays a web page with humidity and filter usage gauges (as I would expect). ... any suggestions on how to get the service to display a web page and expose the API? I am fine with leaving the port to 3000

I'm a little bit confused by the combination of the above two statements. The first seems to imply that Infinitude works when bound to 3000 and you can see its web output, but the second implies the opposite. (granted, I'm easily confused 😄 )

but overall this appears to be a plain old networking debugging problem that we should be able to get sorted.

jftaylorMn commented 1 year ago

This might need to wait until Sunday. I need to take an unplanned road trip to see why my carrier furnace isn't running. 2 feet of fresh snow and possibly a power outage could be possible contributors.

The main issue is that when infinitude appears to be running fine as a service (based on the ps -ef command), I get a denied message on the web page. In contrast, the same command in a session window shows a web page.

jftaylorMn commented 1 year ago

Well, the gremlins are at it... After 5 days and changing "nothing", the Pi Zero 2 W (with infinitude running as a service) is now displaying the website instead of permission denied. I am still using port 3000 and am fine leaving that alone. Unfortunately, I did not try to get the Infinity thermostat connected while there over the weekend, so still just have the humidity and filter usage gauges with no data.

@nebulous - if you want to keep the open issues list low, feel free to close this one, I'm guessing that there could be more chapters to this story, but could reopen this or create a new issue if needed.

Thanks a lot for you support!

jftaylorMn commented 1 year ago

@nebulous I have set up the pi zero 2w in the off grid location and have had some success. But not enough. The web page has populated after a delay and now shows most of what I expected. After some time (an hour or more) the page is no longer responsive and the api response doesn’t update but does respond. Repowering the pi gets it responsive again. I can ssh to the pi and see a number of files. I have hot pulled the latest code, so still running the release when this issue was first created.

I will be here at least most of Wednesday and will probably bring the pi zero home after that.

The pi zero is likely very light on memory . So maybe the leak fix you worked on would help? I don’t think I can refresh that here.

Any diagnostic steps I should try? There must be logs somewhere, but my expertise on the pi is limited. I also have a thin client with an infinitude docker image here that I might try to set up tomorrow.

jftaylorMn commented 1 year ago

As a follow up- when the pi is communicating with the thermostat as expected the on board led is steady on with low intensity. New status files are not created/updated whenever the led is flashing once a second or so. I have kept an ssh session open as it goes into this state. Eventually the system will respond to sudo free command, but after a significant delay. The output from that command indicates that there could be memory issues. The free memory varies quite a bit. And swap memory has dropped to zero. Once I get the sudo free response, new files start appearing again and the api response is nearly instant. When swap was zero, I never got a response.

So while a reboot brings collection back to life, it’s not the only recovery path. I haven’t yet found the logs. Since I think the service has the production switch on maybe that isn’t going to be helpful

A small enhancement request - the response to the status api would be improved if only zone details were included for on

Maybe control by passing a query parameter?

jftaylorMn commented 1 year ago

My pi zero 2 W is now on the internet and 100+ miles away from the thermostat. Since it now has some files, I can at least look at the lockup problems more easily. Using the TOP command (shift-M to sort by memory usage) after ssh-ing to the pi zero 2 w, I can see that perl is the top memory consumer- over the course of time, the memory footprint grew from 56488 to 458488 with no swap space available. Swap started being used when the free memory got below 100 Mb. The led began flashing slowly when the free swap space became low. Ultimately, the web page was not responsive, and characters typed into an ssh session would take multiple seconds to appear (if at all). I tried to stop the infinitude service at that point and that was not successful, with a transport endpoint error. Unplugging was my last resort remedy. It works.

To compliment the top command results, I was able to watch memory increase and confirm that the code behind perl is infinitude using this command:

ps aux | head -1 ;ps aux | sort -rnk 4 | head -1

I have since pulled the latest code and still see memory use increasing with that build over time. The memory leak may be less, but is still there.

I also found the logs for daemon in /var/log/daemon.log. Most of the infinitude messages appear normal to an untrained eye. I have summarized the 2000 infinitude related entries found in the table below <!DOCTYPE html>

LogMessage | count -- | -- 101 Switching Protocols | 100 200 OK | 412 304 Not Modified | 100 GET "/" | 101 GET "/energy.json" | 100 GET "/notifications.json" | 99 GET "/serial" | 100 GET "/status.json" | 100 GET "/status" | 13 GET "/systems.json" | 100 http://*:3000 | 1 Websocket Closed | 101 Routing to a callb | 1 Routing to a callback | 612 Inactivity timeout | 87

The inactivity tmeouts appear to cause a websocket to be closed, but not every time. Is that where a leak could be happening? I also see a lot of info references to a connection of a serial port. I don't have rs485 set up(yet). Modifying the runtime parameters to not use a websocket and/or serial port might yield better results?

I've been writing code for decades, but Perl is totally foreign to me, as are many of the other parts of this repository, I'm not likely to create a pull request. Without a code update, the workaround to keep infinitude stable on the minimal rPi is to schedule a cron task to restart the service once per hour (entering sudo crontab -e ).

Here is the addition to the crontab file that will execute on the first minute of every hour:

# m h dom mon dow command 01 * * * * systemctl restart infinitude.service Running every hour is probably more often than needed, but doesn't seem to be too disruptive, even if there is a web page using it at the time. Restarting does recover the memory

After letting the Pi Zero 2 W run for a few days, I noticed that memory usage was once again at the point where the swap file was being used. That seems to be one step too close to a non-responsive Pi. Additionally, the perl process was no longer the top memory consumer. Looking at journalctl, I found that infinitude was logging about 8 messages per minute stating that the serial port was being used. At the onset, I had visions of using the rs485 portion of the code at some point, so changed the infinitude.json file (per wiki instructions) to point to the rpi serial port. I'm not ready to make that leap, yet. The instructions fail to say anything about changing the default config to disable this feature. I have replaced values with empty strings for both the websocket and SerialTTY, and this at least seems to quell the logging. Maybe there is a better way? I have also set the journalctl max size to 100Mb, which might be too generous. It looks like I could also make that volatile.

Clearly, memory leaks aren't good and ideally, they should be fixed. Practically, that's not always top priority. I suspect if infinitude was running on more than the 500Mb of memory on a pi Zero, this might not cause issues between routine reboots.

@nebulous - There is a lot of information buried in issues that might be good to summarize in the readme or the wiki. I guess I don't know if the memory leak / cron recovery is something that falls into the general category. That should be your decision. I could update the wiki, but would want to make sure that my changes were the preferred option. Mostly, I want to see you make progress on the ESP32 code. In your copious spare time. ;=)

jftaylorMn commented 1 year ago

After setting the socket and serial TTY values to empty string in the infinitude.json file, I am not observing any messages related to connecting to tty in the logs and the memory usage seems to be more stable - possibly even very stable. I don't see any immediate harm in restarting the service hourly, so will keep that in the mix for now. I might add a comment to the wiki in the section dealing with the RS485 connection with a Raspberry Pi.

It seems likely that repeated attempts to access TTY and/or websocket endpoints without releasing resources on timeout or other failure could be a logical source of leaks.

I think this issue can be closed.

nebulous commented 1 year ago

Modifying the runtime parameters to not use a websocket and/or serial port might yield better results?

I suspect the leak is someplace in the serial/websocket code - without much in the way of evidence yet. Thanks for all of the details above. Restarting periodically is a lame kludge deserving of apologies, but should work.

jftaylorMn commented 1 year ago

If lame kludges result is stability over extended periods, I will take it. This is a hobby, so apologies are not required.