Closed AlexThurston closed 1 year ago
Hi @AlexThurston, this seems to be an artifact of something going wonky on our backend (see https://github.com/usnistgov/ACVP-Server/issues/270). Thanks for reporting this. We should have this resolved shortly.
Great. Thanks for the update. It had been working for the past couple of days and just start yesterday afternoon for me again. Production was also reporting degraded at the time so I wondered if they were related.
Everything should be back online and fully functioning now. Thanks again.
Seems like it's still giving a 503 from production
unexpected status code 503 != 200: {
"serverVersion": "v1.1.0.29-1",
"details": [
{
"key": "testSessionProcessing",
"description": "The TestSession internal processing load status.",
"data": {
"healthStatusDefinitions": {
"Healthy": "Oldest pending TestSession is < 1 hours old.",
"Degraded": "Oldest pending TestSession is > 1 hours old.",
"Unhealthy": "Oldest pending TestSession is > 4 hours old."
}
}
}
]
}
Bahahah! Nevermind. I just tried it again and it's working.
It appears as though this is happening again. 503 on the health route on production.
Hmm. Commenting on this doesn't re-open.
Prod just got restarted, about 30m ago, please let me know if it's still not working for you.
Thanks @AlexThurston. Prod should be back and running again now.
Still the same behaviour. 503s. The response does still have the body, but it's missing the status key.
Not sure if it's related, or a different thing, but demo is reporting degraded as well. However, the call is succeeding with a 200 in that case.
Sorry about the issues! Demo is currently under load from a bunch of LMS submissions, hopefully that will be cleared up soon. The issue with Prod is being looked into.
So, it appears we don't have enough LMS Pool values stored, we're currently looking into ways to better handle this. Thanks for the feedback!
The processing issues have been resolved in both the Demo and Prod environments, though the root causes were different, along with some unfortunate timing. So everything should be operating normally again. Appreciate you commenting with your observations as well @AlexThurston ; thanks.
This appears to be happening again on Production. 503s from the health route. Prod still seems to be responding to other actions. This seems to happen each time the service deployment is updated.
Thanks @AlexThurston. We saw this on our end as well and are working on it.
We think this specific instance of this issue is tied to some older hardware we're running on... should be mitigated by our pending/upcoming Prod migration.
It looks like the health route on production is returning a 503. It does return a partial body:
But the
status
key is missingDemo seems to be OK: