Open taraspos opened 2 months ago
I reckon that the simplest way to fix this would be to add a parameter to the proxy /webapi/ping
endpoint that instructs the handler to only return info that comes from the auth server Ping
call rather than the additional auth info - it'll still give us a confirmation that proxy-auth comms are up and working, but with much less variability or dependency on cluster configuration.
Expected behavior
Calls to
/webapi/ping
endpoint do not depend on successful calls to third party dependenciesCurrent behavior
Successful response from
/webapi/ping
endpoint depends on successful response from configured SAML Provider.As result, any issues or slowness on the SAML Provider's side are causing failures or long response time on
/webapi/ping
as well.This is happening because
ValidateSAMLConnector
is called, which useshttp.Get(sc.GetEntityDescriptorURL())
. Because default default http client is used, it also has no client timeout set on the request, so it will wait for the response indefinitely.https://github.com/gravitational/teleport/blob/fc3a7d90f612cc53795c12642e94c09a3d56419f/lib/services/saml.go#L44-L56
Bug details
Teleport version - 15.4.7
Recreation steps
We noticed this issue, because our external healthcheck (which is configured test against
/webapi/ping
) was randomly failing with timeout from time to time. Debugging led to the conclusion that requests hang because of slow responses from the SAML provider.Debug logs
Every request to
/webapi/ping
endpoint produces log entry likehttps://github.com/gravitational/teleport/blob/fc3a7d90f612cc53795c12642e94c09a3d56419f/lib/services/saml.go#L66