This comes from the fact that the error handling code to the 401 response status (which redirects the browser to the homepage) does not trigger. Instead, the websocket open event triggers a v2/bearings ajax call whose handler fails to parse the following unauthorized response with the following trace
{"vault":"","shield":{"api":2,"version":"8.7.2","env":"sandbox","color":"yellow","motd":"Welcome to SHIELD!\n"},"user":null,"stores":null,"tenants":null}
data.js:489 Uncaught TypeError: Cannot read property 'length' of null
at Object.success (data.js:489)
at i (jquery.js:2)
at Object.fireWith [as resolveWith] (jquery.js:2)
at A (jquery.js:4)
at XMLHttpRequest.<anonymous> (jquery.js:4)
As a result of this unhandled error, the websocket is closed. When the v2/events websocket closes, the shield client immediately reopens a new websocket, entering an infinite loop of requests to v2/events and v2/bearings
This infinite loop exhausts client and server resources. As a result, when the login form is submitted, the login ajax requests seems to loose the race with other requests or possibly be cancelled by the shield error handling, resulting in the login page to never succeed.
The attached shield-firefox-diags.zip file contains firefox HAR network calls that can be reloaded and the recorded performance graphs which show the infinite loop as the screenshot shows.
This PR adds some delays (3s) in the websocket close event handler as to slow down this infinite loop. It improved the situation much, while a stalled login forms was still observed once out of 20 successful logins. I'm therefore suspecting that there remains a race condition between login and error handling code which at times cancels the login ajax request.
When shield is fronted by a reverse proxy which hides the 401 authentication status to the websocket handshake and rather returns a
HTTP/1.1 101 Switching Protocols
response (see https://github.com/spring-cloud/spring-cloud-gateway/issues/1884 and https://stackoverflow.com/questions/63196638/spring-cloud-gateway-hides-server-websocket-handshake-401-failures-to-clients), then shield client code enters an infinite loop of requests tov2/events
andv2/bearings
.This comes from the fact that the error handling code to the 401 response status (which redirects the browser to the homepage) does not trigger. Instead, the websocket open event triggers a
v2/bearings
ajax call whose handler fails to parse the following unauthorized response with the following traceAs a result of this unhandled error, the websocket is closed. When the
v2/events
websocket closes, the shield client immediately reopens a new websocket, entering an infinite loop of requests tov2/events
andv2/bearings
This infinite loop exhausts client and server resources. As a result, when the login form is submitted, the login ajax requests seems to loose the race with other requests or possibly be cancelled by the shield error handling, resulting in the login page to never succeed.
The attached shield-firefox-diags.zip file contains firefox HAR network calls that can be reloaded and the recorded performance graphs which show the infinite loop as the screenshot shows.
This PR adds some delays (3s) in the websocket close event handler as to slow down this infinite loop. It improved the situation much, while a stalled login forms was still observed once out of 20 successful logins. I'm therefore suspecting that there remains a race condition between login and error handling code which at times cancels the login ajax request.
I hope this PR can make it into a shield bug fix release prior to the v9 release which plans major rewamp to shield client UI according to https://shieldproject.io/community/call/#20200528
/CC @JCL38-ORANGE @poblin-orange