tagyoureit / nodejs-poolController

An application to control pool equipment from various manufacturers.
GNU Affero General Public License v3.0
314 stars 94 forks source link

Controller randomly stopping #333

Closed jarrah42 closed 2 years ago

jarrah42 commented 2 years ago

Describe the bug I'm using nixie single body with an IntelliFlow VS pump and an iChlor 30. At seemingly random times, the controller just stops. It's hard to determine the exact interval, but it seems to run for a few days to a week each time. There is nothing in the log file to indicate why it exits. I've been seeing this problem all summer. I've been keeping up to date on commits, but it still persists.

i've tried packet logging, but it doesn't seem to indicate anything is wrong either. Is there any way to put some logging in to determine if the application is exiting out of the main event loop or is crashing? Otherwise, any suggestions on debugging this would be appreciated.

rstrouse commented 2 years ago

Give me a little bit of info regarding your environment. Is this running on RPI? Do you use PM2 for launching at bootup? Are you connecting to RS485 through socat?

EDIT: By stops do you mean that the process disappears or does it just stop communicating with the pump and iChlor?

jarrah42 commented 2 years ago

Yes, it's running on an RPI. It's possible that it's due to the RPI rebooting and pm2 not restarting the controller (it's restarting the other services.) I've run pm2 save so will keep an eye on when/if it stops again and check to see if the RPI also rebooted.

I mean the controller process exits completely (listed as "stopped" in pm2).

rstrouse commented 2 years ago

PM2 should have some logging. Are you using log rotate?

EDIT: I have a mock Nixie controller that has been running for over a month. The only reason it has bee shut down is so that I can install the latest repos.

rstrouse commented 2 years ago

Here is a copy of my ecosystem.config.js for njsPC. NOTE: I have it ignore the watch on all of the potential directories that can be written during normal operation.

module.exports = {
  apps : [{
    name: 'njsPC',
    cwd: '/home/pi/nodejs-poolController',
    script: 'npm',
    args: 'start',
    autorestart: false,
    restart_delay: 10000,
    watch: true,
    ignore_watch: ['data', 'config.json', 'node_modules', 'logs', 'dist', 'cache'],
    listen_timeout: 60000
  }],

  deploy : {
    production : {
      user : 'SSH_USERNAME',
      host : 'SSH_HOSTMACHINE',
      ref  : 'origin/master',
      repo : 'GIT_REPOSITORY',
      path : 'DESTINATION_PATH',
      'pre-deploy-local': '',
      'post-deploy' : 'npm install && pm2 reload ecosystem.config.js --env production',
      'pre-setup': ''
    }
  }
};
tagyoureit commented 2 years ago

Check out the newer example in the PM2 Startup - wiki. It is now uses a set of directories that are specifically watched instead of watching everything and excluding. It's a small difference, but it does seem to be a bit more stable for me.

If you have the packet logs, do attach them again. Sometimes what's missing is as important as what's there. Maybe we'll see something new?

There is nothing in the log file to indicate why it exits.

What do you mean exactly by "exits"? Is njsPC restarting? Is PM2 restarting the app? Or is it like the previous behavior where the app is still running but it no longer controls the equipment?

jarrah42 commented 2 years ago

I had previously set up PM2 to restart njsPC, dashPanel, and REM when the system reboots. What generally happens is that after some period of time the pump stops. When I check the PM2 status, dashPanel and REM show "online" status and njsPC shows "stopped". The last entry in the PM2 log at this point is just one of the info/warn messages that get output by njsPC every few minutes, so I was suspicious that the njsPC process was terminating for some reason.

However, checking the console messages does show that the RPI has been rebooting:

Aug 1 17:17:03 rp1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0 Jul 29 00:58:35 rp1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0 Jul 25 12:34:08 rp1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0 Jul 17 19:17:03 rp1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0 Jul 7 23:41:42 rp1 kernel: [ 0.000000] Booting Linux on physical CPU 0x0

What I thought may be happening is that the RPI was rebooting but PM2 was only restarting dashPanel and REM, not njsPC. I thought this may be possible because I did re-install njsPC at one point (when you switched to the master branch I think) but may not have run pm2 save (although I'm not sure if that is necessary.)

I think this is confirmed because when I check the PM2 logs, I find the log messages stopped at the following times:

[8/1/2021, 5:51:25 PM] [7/29/2021, 1:03:30 AM] [7/25/2021, 12:44:30 PM] [7/17/2021, 7:19:26 PM] [7/8/2021, 12:11:11 AM]

These seem close enough to the reboot times to be related.

Sooooo, now that I've re-run pm2 save, I'll wait to see if the pump stops again (hopefully it won't) or the RPI reboots to make sure PM2 is restarting njsPC properly.

rstrouse commented 2 years ago

Why is the pi rebooting? It shouldn’t need to do that.

jarrah42 commented 2 years ago

That's the $64 question. It may just be unreliable power, but it hasn't rebooted since 8/1 in any case.

tagyoureit commented 2 years ago

Please re-open if this surfaces again.