paukstelis / octoprint_deploy

Bash script for rapid deployment of multiple octoprint instance on a single machine
MIT License
388 stars 39 forks source link

Instance fails randomly #33

Open taker218 opened 1 year ago

taker218 commented 1 year ago

Hi,

I'm currently having a problem with one of my instances. The instance randomly fails and I need to restart the service to get it going again.

Here's what the output of systemctl status X5SA gives me:

X5SA.service - The snappy web interface for your 3D printer Loaded: loaded (/etc/systemd/system/X5SA.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Tue 2022-08-09 20:24:36 CEST; 12h ago Process: 822405 ExecStart=/home/thomas/OctoPrint/bin/octoprint serve --config=${CONFIGFILE} --basedir=${BASEDIR} --port=${PORT} (code=killed, signal=SEGV) Main PID: 822405 (code=killed, signal=SEGV) CPU: 12min 4.852s

Aug 09 19:28:47 octoprint-host octoprint[822405]: 2022-08-09 19:28:47,831 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 125105, 'printer_state': 'OFFLINE'} Aug 09 19:43:42 octoprint-host octoprint[822405]: 2022-08-09 19:43:42,891 - octoprint.server.heartbeat - INFO - Server heartbeat <3 Aug 09 19:43:47 octoprint-host octoprint[822405]: 2022-08-09 19:43:47,833 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 126005, 'printer_state': 'OFFLINE'} Aug 09 19:58:42 octoprint-host octoprint[822405]: 2022-08-09 19:58:42,892 - octoprint.server.heartbeat - INFO - Server heartbeat <3 Aug 09 19:58:47 octoprint-host octoprint[822405]: 2022-08-09 19:58:47,835 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 126905, 'printer_state': 'OFFLINE'} Aug 09 20:13:42 octoprint-host octoprint[822405]: 2022-08-09 20:13:42,893 - octoprint.server.heartbeat - INFO - Server heartbeat <3 Aug 09 20:13:47 octoprint-host octoprint[822405]: 2022-08-09 20:13:47,844 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 127805, 'printer_state': 'OFFLINE'} Aug 09 20:24:36 octoprint-host systemd[1]: X5SA.service: Main process exited, code=killed, status=11/SEGV Aug 09 20:24:36 octoprint-host systemd[1]: X5SA.service: Failed with result 'signal'. Aug 09 20:24:36 octoprint-host systemd[1]: X5SA.service: Consumed 12min 4.852s CPU time.

Does anyone have an idea where I should have a look at to get to the bottom of this? The other instance runs without a problem.

here's the content of the X5SA.service file:

[Unit]
Description=The snappy web interface for your 3D printer
After=network.online.target
Wants=network.online.target

[Service]
Environment="PORT=5002"
Environment="BASEDIR=/home/thomas//.X5SA"
Environment="CONFIGFILE=/home/thomas//.X5SA/config.yaml"
User=thomas
ExecStart=/home/thomas/OctoPrint/bin/octoprint serve --config=${CONFIGFILE} --basedir=${BASEDIR} --port=${PORT}

[Install]
WantedBy=multi-user.target

I compared it to the X5SAPro.service file and it's basically the same (except the different values of the variables of course)

taker218 commented 1 year ago

Okay, I just looked at the dmesg output and found this: [Tue Aug 9 20:24:35 2022] octoprint[822405]: segfault at 7f6a905d23d0 ip 000000000051d2bc sp 00007fff6263e550 error 6 in python3.9[41f000+288000] [Tue Aug 9 20:24:35 2022] Code: 4c 8b 6f 40 83 c2 01 48 8d 9f 68 01 00 00 41 89 94 24 b8 00 00 00 4c 39 eb 73 22 48 8b 3b 48 85 ff 74 11 48 c7 03 00 00 00 00 <48> 83 2f 01 0f 84 ba 00 00 00 48 83 c3 08 4c 39 eb 72 de 48 83 7d

paukstelis commented 1 year ago

Not something I have seen before, even with running many instances. It is possible there is a memory issues, which can give rise to segfaults. You could try running top and seeing what is happening with memory usage as the two instances run.

taker218 commented 1 year ago

I'll have a look at the memory usage, but there should be enough memory for those two instances.

Maybe a memory stick is bad, since this is an old laptop I'm currently using.

Just a couple of minutes ago the other instance crashed (same error in dmesg output).

paukstelis commented 1 year ago

I'll have a look at the memory usage, but there should be enough memory for those two instances.

Maybe a memory stick is bad, since this is an old laptop I'm currently using.

Just a couple of minutes ago the other instance crashed (same error in dmesg output).

yeah, some bad memory might be what you are looking at here.