usnistgov / ACVP-Server

A repository tracking releases of NIST's ACVP server. See www.github.com/usnistgov/ACVP for the protocol.
39 stars 14 forks source link

/health endpoint no longer returning a "status" value #270

Closed mikeboulet closed 1 year ago

mikeboulet commented 1 year ago

environment Prod

testSessionId n/a

vsId n/a

Algorithm registration n/a

Endpoint in which the error is experienced GET https://acvts.nist.gov:443/health

Expected behavior Expecting a "status" value from the /health endpoint as specified here: https://github.com/usnistgov/ACVP-Server/wiki/Health-Check---Server-Version-Endpoint

Additional context This was working sometime last week. Now getting the following:

{ "serverVersion": "v1.1.0.28-2", "details": [ { "key": "testSessionProcessing", "description": "The TestSession internal processing load status.", "data": { "healthStatusDefinitions": { "Healthy": "Oldest pending TestSession is < 1 hours old.", "Degraded": "Oldest pending TestSession is > 1 hours old.", "Unhealthy": "Oldest pending TestSession is > 4 hours old." } } } ] }

You can see the status value is missing

livebe01 commented 1 year ago

Hi @mikeboulet,

I queried the /health endpoint after seeing this ticket and the status value came back for me. Early last week (Monday and Tuesday), we experienced an issue where one of ACVTS Prod's internal processes became unresponsive. I believe the behavior you experienced is related to that. I believe the /health endpoint was dependent in some fashion on that process and when it went unresponsive that caused the status value to be omitted. I'll make a note in our internal issue tracking system so that we can track this and look into improving the robustness of the endpoint.

Thanks,

Ben

abkarcher commented 1 year ago

Hi @livebe01, it looks likes this has been happening again today; have there been any service outages on the server end or is this just a hiccup on the health endpoint? (We are getting retry messages just fine for a bit now, just wanted to bring it up in case something silently failed!)

Thanks for your time, Andrew

livebe01 commented 1 year ago

Hi @abkarcher, thanks for the update. It looks like we were experiencing an issue on 7/6, but it should be resolved I believe.