ministryofjustice / analytics-platform

Parent repository for the MOJ Analytics Platform
MIT License
14 stars 1 forks source link

Unidler giving misleading messages #95

Closed samtazzyman closed 5 years ago

samtazzyman commented 5 years ago

What happened?

Jupyter

I got the following error when opening Jupyter from cpanel:

screen shot 2019-03-01 at 10 45 40

URL: https://samtazzyman-jupyter-lab.tools.alpha.mojanalytics.xyz/

On cpanel, the status of jupyter was still marked as idle. When I refreshed cpanel, it came up as running, and I could open it as usual with no problem.

So: it did unidle, but it claimed not to have done so.

RStudio

When I unidle from cpanel, it comes out of the unidler and drops me in to this:

screen shot 2019-03-01 at 10 58 51

This time cpanel had automatically updated to show my RStudio running, and I could link to it from there with no trouble.

URL: https://samtazzyman-rstudio.tools.alpha.mojanalytics.xyz/

I seems like unidler dumped me out slightly too early before the instance was fully ready.

Steps to reproduce the problem

Go to cpanel and unidle either RStudio or Jupyter

Expected behaviour

Open RStudio/Jupyter without these misleading errors.

andyhd commented 5 years ago

I have seen these 503s when the unidler thinks the app is ready but it isn't quite. Usually you can refresh and the app will load.

I think the solution is to modify the unidler's test for app readiness. Currently it watches the deployment status until at least 1 pod is available, but there is still a variable amount of time (usually only a few seconds) between the pod being created and the app being ready to accept requests.

If we change the unidler to check the pod(s) status for a successful liveness probe, the behaviour should be as intended.

andyhd commented 5 years ago

It's difficult to know what caused the Unidler error message in the screenshot, however. undefined is a Javascript undefined value, so that would suggest that the Unidler sent a Server Side Event with an empty message somehow.

Looking at the source code the only cases where an SSE is sent with a message that isn't a predefined string is in the case of Kubernetes API errors, so probably the error handling code (which should just pass on the error message) is making a faulty assumption about the API error format.

andyhd commented 5 years ago

You should probably have @samtazzyman test this and let him close the bug

xoen commented 5 years ago

Good point @andyhd - I was planning to talk with @samtazzyman tomorrow morning (I'm going to release the new unidler in alpha in few minutes). BTW: I did not close this manually, it was by way of some sort of GH sourcery 😁

I wonder how they reproduced this issue because. I managed to reproduce only when once I accidentally stopped the idler. But I can't remember if that was before I moved the restoreReplicas() at the end...I wonder if the EventSource thing does not automatically re-open a connection when the server dies and that's what's causing it...

xoen commented 5 years ago

Let's keep an eye on this. If we don't see this error for a bit we may want to close it.

We may still possibly see this undefined error depending on how the EventSource handles reconnections, but hopefully not/not as frequently anyway.

xoen commented 5 years ago

@samtazzyman closing this. I'm confident I've fixed the 2 most likely cause of this undefined error in the following PR (for whom is interested):

(obviously, let's keep an eye on this and let me know if this error pops up again, I don't think it will)