jhuckaby / Cronicle

A simple, distributed task scheduler and runner with a web based UI.
http://cronicle.net
Other
3.67k stars 381 forks source link

WebUI keeps reloading once a second #523

Open schnurlosdusche opened 2 years ago

schnurlosdusche commented 2 years ago

Summary

I installed a second cronicle server in my network. The first cronicle server runs now for months without any flaws. The installation of server No.2 on a CentOS 7 server installation went without any problems and after changing the key in the config.json, it was recognized by my first cronicle server and should act as backup server.

I am not able to deactivate the scheduler in the UI top right corner on server No. 2. For a blink of an eye, a 500 error message appears (just diappears due to reloading the whole page).

Steps to reproduce the problem

No additional steps needed, just did a sudo control.sh start and went to "fileserver.wir.home:3012", logged in with admin/admin and the UI starts to reload every single second... the login screen does not reload every second and acts as intended.

EDIT: In the meantime, I updated Server No. 1 from 0.9.7 to 0.9.10. As I started No. 2 first and then No. 1, this issue happens now on the other server (No. 1). It seems, that the later started server shows this behaviour ? When you look at the logs, it seems like to be a problem with a connection over a socket between both servers...

The moment, I shutdown the server, where the UI is not flashing, the problem disappears ! It takes a few seconds but after that, the running server works fine !

Your Setup

As mentioned above, server no.2 will be a backup server on one of my machines in my network. The OS is installed on an ESXi host as virtual machine (CentOS 7, Linux fileserver.wir.home 3.10.0-1160.71.1.el7.x86_64). All servers/machines are in the same network segment.

Server No.1 and No.2 have all ssh keys shared with each other

Operating system and version?

see above

Node.js version?

(on the new server No. 2) : node -v v16.15.0

(on the old server No. 1) : v12.12.0

Cronicle software version?

./control.sh version 0.9.10 (Server No.1 runs Cronicle Version 0.9.7, now 0.9.10 but no change in UI flashing)

Are you using a multi-server setup, or just a single server?

My goal is to setup a multi-server setup

Are you using the filesystem as back-end storage, or S3/Couchbase?

I am using a central samba share as back-end storage

Can you reproduce the crash consistently?

Log Excerpts

Cronicle.log : (last lines)

[1661379258.405][2022-08-25 00:14:18][fileserver.wir.home][2534][Cronicle][debug][5][New socket.io client connected: 0371qGg9gFPkeNbIAA1n (IP: ::ffff:192.168.1.9)][] [1661379258.409][2022-08-25 00:14:18][fileserver.wir.home][2534][Cronicle][debug][4][Socket client 0371qGg9gFPkeNbIAA1n has authenticated via user session (IP: ::ffff:192.168.1.9)][] [1661379258.421][2022-08-25 00:14:18][fileserver.wir.home][2534][Cronicle][debug][4][Socket client 0371qGg9gFPkeNbIAA1n has authenticated via user session (IP: ::ffff:192.168.1.9)][] [1661379259.353][2022-08-25 00:14:19][fileserver.wir.home][2534][Cronicle][debug][5][Socket.io client disconnected: 0371qGg9gFPkeNbIAA1n (IP: ::ffff:192.168.1.9)][] [1661379259.408][2022-08-25 00:14:19][fileserver.wir.home][2534][Cronicle][debug][5][New socket.io client connected: MLANkuuLzqg2rzLuAA1p (IP: ::ffff:192.168.1.9)][] [1661379259.412][2022-08-25 00:14:19][fileserver.wir.home][2534][Cronicle][debug][4][Socket client MLANkuuLzqg2rzLuAA1p has authenticated via user session (IP: ::ffff:192.168.1.9)][] [1661379259.426][2022-08-25 00:14:19][fileserver.wir.home][2534][Cronicle][debug][4][Socket client MLANkuuLzqg2rzLuAA1p has authenticated via user session (IP: ::ffff:192.168.1.9)][] [1661379260.353][2022-08-25 00:14:20][fileserver.wir.home][2534][Cronicle][debug][5][Socket.io client disconnected: MLANkuuLzqg2rzLuAA1p (IP: ::ffff:192.168.1.9)][] [1661379260.407][2022-08-25 00:14:20][fileserver.wir.home][2534][Cronicle][debug][5][New socket.io client connected: K-jfWHBeyCFhSs4dAA1r (IP: ::ffff:192.168.1.9)][] [1661379260.419][2022-08-25 00:14:20][fileserver.wir.home][2534][Cronicle][debug][4][Socket client K-jfWHBeyCFhSs4dAA1r has authenticated via user session (IP: ::ffff:192.168.1.9)][] [1661379260.432][2022-08-25 00:14:20][fileserver.wir.home][2534][Cronicle][debug][4][Socket client K-jfWHBeyCFhSs4dAA1r has authenticated via user session (IP: ::ffff:192.168.1.9)][]

Webserver.log : (last lines)

[1661379437.454][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Upload progress: 81 of 81 bytes][{"socket":"c985"}] [1661379437.454][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][6][Invoking handler for request: POST /api/user/resume_session: API][{"id":"r8074"}] [1661379437.458][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Compressed text output with gzip: 2758 bytes down to: 1080 bytes][] [1661379437.458][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Sending compressed HTTP response with gzip: 200 OK][{"Content-Type":"application/json","Access-Control-Allow-Origin":"","Server":"Cronicle 1.0","Content-Length":1080,"Content-Encoding":"gzip"}] [1661379437.458][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Request complete][] [1661379437.458][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Response finished writing to socket][{"id":"r8074"}] [1661379437.458][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Request performance metrics:][{"scale":1000,"perf":{"total":4.676,"queue":0.031,"read":0.102,"process":3.815,"encode":0.224,"write":0.286},"counters":{"bytes_in":561,"bytes_out":1229,"num_requests":1}}] [1661379437.459][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Keeping socket open for keep-alives: c985][] [1661379437.467][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][8][New HTTP request: POST /api/app/get_activity (192.168.1.9)][{"id":"r8075","socket":"c985","version":"1.1"}] [1661379437.467][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Incoming HTTP Headers:][{"host":"192.168.1.98:3012","connection":"keep-alive","content-length":"120","accept":"text/plain, /; q=0.01","x-requested-with":"XMLHttpRequest","user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.101 Safari/537.36","content-type":"text/plain","origin":"http://192.168.1.98:3012","referer":"http://192.168.1.98:3012/","accept-encoding":"gzip, deflate","accept-language":"de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"}] [1661379437.467][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Upload progress: 120 of 120 bytes][{"socket":"c985"}] [1661379437.467][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][8][Invoking filter for request: POST /api/app/get_activity: API Filter][{"id":"r8075"}] [1661379437.468][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Filter passthru, continuing onward][] [1661379437.468][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][6][Invoking handler for request: POST /api/app/get_activity: API][{"id":"r8075"}] [1661379437.469][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Compressed text output with gzip: 41 bytes down to: 60 bytes][] [1661379437.469][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Sending compressed HTTP response with gzip: 200 OK][{"Content-Type":"application/json","Access-Control-Allow-Origin":"","Server":"Cronicle 1.0","Content-Length":60,"Content-Encoding":"gzip"}] [1661379437.469][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Request complete][] [1661379437.469][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Response finished writing to socket][{"id":"r8075"}] [1661379437.469][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Request performance metrics:][{"scale":1000,"perf":{"total":2.543,"queue":0.029,"read":0.099,"filter":0.787,"process":0.967,"encode":0.131,"write":0.279},"counters":{"bytes_in":598,"bytes_out":207,"num_requests":1}}] [1661379437.47][2022-08-25 00:17:17][fileserver.wir.home][2534][WebServer][debug][9][Keeping socket open for keep-alives: c985][]

I hope this problem is only a small problem and can be solved with just a setting or something, I did not take in consideration till now... Thanks for this software, makes my life more comfortable ;)

mikeTWC1984 commented 2 years ago

Yeah, I also noticed that. This happens when you try to connect to websocket from different places (e.g. if you have cronicle open in browser and trying to connect via node using same session). Not sure what's the reason of this ping pong, need to dig deeper. Although it shouldn't prevent you from having multiple nodes. Maybe you open UI from different browsers or something like this.

jhuckaby commented 2 years ago

I wish I was able to repro this. I've heard about it from several people now, but I can't seem to make it happen. I run MANY Cronicle servers 24x7, in a multi-master, multi-slave setup. I wish I knew the secret incantation to trigger this bug, so I can finally track it down.

It feels like the UI is detecting that it has loaded on a non-master (worker) node, and is trying to refresh over to the master one. This bit of code here:

https://github.com/jhuckaby/Cronicle/blob/master/htdocs/js/app.js#L506-L517

This has a location.reload() which I think may be happening here. OR, it could be socket.io firing a reconnect_failed event, which also triggers a location.reload():

https://github.com/jhuckaby/Cronicle/blob/master/htdocs/js/app.js#L319-L322

I just need to SEE this happen to figure it out. Bah!

schnurlosdusche commented 2 years ago

I encountered the "Waiting for master" message several times in the past after a fresh install. I don't know the whole code but do you think it might be possible to define the role of the installation during the first setup ?

Something like :

chosse type of installation : 1) single master 2) a worker 3) slave/backup 4) naaah, don't know, do what you need to do (default)

Whereas this might be changed later as this is possible right now ?


EDIT :

mikeTWC1984 commented 2 years ago

As far as I remember, there were 2 scenarios when this happens. 1) You have master and backup nodes, and you try to login into cronicle via backup (or maybe both at the same time). 2) You try to set up websocket connection from somewhere else. E.g. from node REPL.

const SocketIOClient = require('socket.io-client');
let url = 'http://localhost:3012'
let socket = new SocketIOClient( url, {multiplex: false,      forceNew: true, reconnection: false, timeout: 5000} );
socket.on('update', (data)=>console.log(data))
socket.on('status', (data)=>console.log(data))
let token = '...putHereSessionId'  // copy from browser console, from app.session_id
socket.emit('authenticate', {token:token})

I tried to run scenario 2, and no longer getting any issue. Maybe socket.io upgrade helped. Can't test 1 at the moment.

@jhuckaby btw, noticed one bug - if you try emit launch_job this way using some dummy data it will crash cronicle instance. E.g.:

socket.emit('launch_job', {id:'blaaah'})

I also tried abort_job/update_job, and it's not causing any issue.

here's some logs:

[1661547134.915][2022-08-26 16:52:14][3e6e4df6f67e][9][Cronicle][debug][6][Launching local job][{"id":"blaaah","log_file":"/opt/cronicle/logs/jobs/blaaah.log"}]
[1661547134.919][2022-08-26 16:52:14][3e6e4df6f67e][9][Cronicle][debug][1][Uncaught Exception: TypeError: Cannot read properties of undefined (reading 'match')][TypeError: Cannot read properties of undefined (reading 'match')     at constructor.launchLocalJob (/opt/cronicle/lib/job.js:652:18)     at Socket.<anonymous> (/opt/cronicle/lib/comm.js:379:32)     at Socket.emit (node:events:527:28)     at Socket.emitUntyped (/opt/cronicle/node_modules/socket.io/dist/typed-events.js:69:22)     at /opt/cronicle/node_modules/socket.io/dist/socket.js:466:39
mikeTWC1984 commented 2 years ago

@jhuckaby Looks like socket auth via sessions gives any user (even readonly ones) a way to restart/shutdown server and run jobs. "socket.on" events don't check client role or permission. https://github.com/jhuckaby/Cronicle/blob/master/lib/comm.js#L362 I think socket action need to be limited to server to server communication.

jhuckaby commented 2 years ago

Thank you, @mikeTWC1984. I'll look into that security issue right away.

Update: The security issue (not the Web UI refresh issue) is fixed in https://github.com/jhuckaby/Cronicle/releases/tag/v0.9.11

guillebot commented 2 years ago

Hi there. It's happening to me after a upgrade. Very simple installation here. Single master server. I only access it from web ui.

AndrejThomsen commented 1 year ago

I wish I was able to repro this. I've heard about it from several people now, but I can't seem to make it happen. I run MANY Cronicle servers 24x7, in a multi-master, multi-slave setup. I wish I knew the secret incantation to trigger this bug, so I can finally track it down.

@jhuckaby Maybe this will help. I created a simple cluster-setup with docker which has the exact same issue here