shieldproject / shield

A standalone system that can perform backup and restore functions for a wide variety of pluggable data systems
MIT License
367 stars 69 forks source link

Fix for infinite loop through misbehaving WebSocket proxies #699

Closed gberche-orange closed 4 years ago

gberche-orange commented 4 years ago

When shield is fronted by a reverse proxy which hides the 401 authentication status to the websocket handshake and rather returns a HTTP/1.1 101 Switching Protocols response (see https://github.com/spring-cloud/spring-cloud-gateway/issues/1884 and https://stackoverflow.com/questions/63196638/spring-cloud-gateway-hides-server-websocket-handshake-401-failures-to-clients), then shield client code enters an infinite loop of requests to v2/events and v2/bearings.

This comes from the fact that the error handling code to the 401 response status (which redirects the browser to the homepage) does not trigger. Instead, the websocket open event triggers a v2/bearings ajax call whose handler fails to parse the following unauthorized response with the following trace

{"vault":"","shield":{"api":2,"version":"8.7.2","env":"sandbox","color":"yellow","motd":"Welcome to SHIELD!\n"},"user":null,"stores":null,"tenants":null}
data.js:489 Uncaught TypeError: Cannot read property 'length' of null
    at Object.success (data.js:489)
    at i (jquery.js:2)
    at Object.fireWith [as resolveWith] (jquery.js:2)
    at A (jquery.js:4)
    at XMLHttpRequest.<anonymous> (jquery.js:4)

As a result of this unhandled error, the websocket is closed. When the v2/events websocket closes, the shield client immediately reopens a new websocket, entering an infinite loop of requests to v2/events and v2/bearings

This infinite loop exhausts client and server resources. As a result, when the login form is submitted, the login ajax requests seems to loose the race with other requests or possibly be cancelled by the shield error handling, resulting in the login page to never succeed.

The attached shield-firefox-diags.zip file contains firefox HAR network calls that can be reloaded and the recorded performance graphs which show the infinite loop as the screenshot shows.

image

This PR adds some delays (3s) in the websocket close event handler as to slow down this infinite loop. It improved the situation much, while a stalled login forms was still observed once out of 20 successful logins. I'm therefore suspecting that there remains a race condition between login and error handling code which at times cancels the login ajax request.

I hope this PR can make it into a shield bug fix release prior to the v9 release which plans major rewamp to shield client UI according to https://shieldproject.io/community/call/#20200528

/CC @JCL38-ORANGE @poblin-orange

gberche-orange commented 4 years ago

thanks for merging @jhunt