Open 0x777 opened 4 years ago
This makes sense to me at a high level. Could you summarize the network issue and how you ended up debugging it?
The network issue was that the clients connected to graphql-engine (deployed on k8s with istio) were noticing that the requests take a long time to finish(1-2 seconds). However in the logs of graphql-engine, the execution times were in the order of few milliseconds. The network connection between graphql-engine and the client was the issue. If $request_time
was also present in the log, it would've been easier to pin point the issue to the network layer.
how you ended up debugging it?
By adding logging on the clients and tracing those requests in graphql-engine (through a session variable x-hasura-trace-id).
@0x777 Why is this reopened?
@lexi-lambda Because @jberryman added the time taken to read the request body, but request_time
as described in the issue is not implemented. Looks like this may not be possible with spock currently.
Got it, thank you! Maybe we need to add it at the WAI level, then?
@0x777 IIUC request_time
is defined as follows:
request_time = time taken to read the request body +
time taken to execute that request and send the response back to client
While I was working on Server Metrics issue - I noticed that in our mkSpockAction
function, we have ioWaitTime
as well as serviceTime
which can be used for calculating the request_time
mentioned in this issue.
Referencing from the code:
ioWaitTime
is retrieved at this line
(ioWaitTime, reqBody) <- withElapsedTime $ liftIO $ Wai.strictRequestBody req
serviceTime
is retrieved at this line
(serviceTime, (result, q)) <- withElapsedTime $ case apiHandler of
AHGet handler -> do
res <- lift $ runHandler handler
return (res, Nothing)
AHPost handler -> do
parsedReqE <- runExceptT $ parseBody reqBody
parsedReq <- onLeft parsedReqE (logErrorAndResp (Just userInfo) requestId req (reqBody, Nothing) includeInternal headers . qErrModifier)
res <- lift $ runHandler $ handler parsedReq
return (res, Just parsedReq)
* `withElapsedTime` is defined [here](https://github.com/hasura/graphql-engine-mono/blob/58b0033be2ac9c0c757319a1a5f735e53f08307b/server/src-lib/Hasura/Prelude.hs#L174-#L180)
```haskell
-- | Time an IO action, returning the time with microsecond precision. The
-- result of the input action will be evaluated to WHNF.
--
-- The result 'DiffTime' is guarenteed to be >= 0.
withElapsedTime :: MonadIO m => m a -> m (DiffTime, a)
If I understand all of the above correctly, will the following be a proper definition for $request
parameter:
request = time taken to read the request body (ioWaitTime) +
time taken to execute that request and send the response back (serviceTime)
where
serviceTime = Service time, not including request IO wait time.
iotWaitTime = Request IO wait time, i.e. time spent reading the full request from the socket.
If the definition of request
is correct as mentioned above, does this mean that this issue can now be progressed and we can finally log the $request_time
logging parameter?
@0x777 Just wanted to check in on this - I'm just an invested user, wondering if there is anything I could do to contribute to push this along?
It seems like the data is there, just not exposed in the logging output at this time
Currently, we only log the execution time for a given request. It would be nice to measure the time from 'the arrival of first byte of the request' to the 'last byte sent as response'. This can help people debug network issues easily.
From nginx's docs: