runt1me / stormcloud

the best backup system
2 stars 0 forks source link

Handle Broken Pipe Issues on Server #30

Closed runt1me closed 1 year ago

runt1me commented 1 year ago

If the client closes prematurely, the server sometimes crashes with a broken pipe. Need to handle this on the server side. Error logs are in the /root directory on www2.

runt1me commented 1 year ago

https://stackoverflow.com/questions/180095/how-to-handle-a-broken-pipe-sigpipe-in-python/180922#180922 has some good info

runt1me commented 1 year ago

Added some exception handling logic in 1510889fb03eb07ffefdb149fee19ec8e8c7f12d, but difficult to know how fully this will address the broken pipe issues until we give it time for random internet spray to take effect. Also I am worried that the exception handling logic may stop the server from accepting new connections. Leaving this open as I need more time to look into this in the future.

runt1me commented 1 year ago

Took more steps to address exception handling and tracked down the source of the worst phantom crashes. It appears that they were primarily caused by clients (specifically, web browsers) who were disconnecting prematurely. This was causing the server to get stuck in a TIME_WAIT state waiting for their response (which was never coming, since they had already disconnected). The server was stuck in TIME_WAIT which blocked the socket and prevented future clients from connecting in.

The best fix I could come up with was to add a connection timeout (currently 3 seconds) on the server side (the default is to have no timeout on the server side). This logic is finished for all components of the server in 0c66981bc1c61bb641578cfa4941a4be5da9f59b.

The downside to adding a connection timeout is this now forces all legitimate stormcloud client communications to finish in under 3 seconds (or whatever we set the timeout to). For the installer this won't be an issue, but for the backup process, we will need to chunk file backups and send them in small pieces instead so the transfer chunks can finish in under 3 seconds. I will need to open a new issue to redesign the client and the server to support chunking.