hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.08k stars 2.77k forks source link

implement nginx's $request_time logging parameter #3547

Open 0x777 opened 4 years ago

0x777 commented 4 years ago

Currently, we only log the execution time for a given request. It would be nice to measure the time from 'the arrival of first byte of the request' to the 'last byte sent as response'. This can help people debug network issues easily.

From nginx's docs:

$request_time request processing time in seconds with a milliseconds resolution; time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client

jberryman commented 4 years ago

This makes sense to me at a high level. Could you summarize the network issue and how you ended up debugging it?

0x777 commented 4 years ago

The network issue was that the clients connected to graphql-engine (deployed on k8s with istio) were noticing that the requests take a long time to finish(1-2 seconds). However in the logs of graphql-engine, the execution times were in the order of few milliseconds. The network connection between graphql-engine and the client was the issue. If $request_time was also present in the log, it would've been easier to pin point the issue to the network layer.

0x777 commented 4 years ago

how you ended up debugging it?

By adding logging on the clients and tracing those requests in graphql-engine (through a session variable x-hasura-trace-id).

lexi-lambda commented 4 years ago

@0x777 Why is this reopened?

0x777 commented 4 years ago

@lexi-lambda Because @jberryman added the time taken to read the request body, but request_time as described in the issue is not implemented. Looks like this may not be possible with spock currently.

lexi-lambda commented 4 years ago

Got it, thank you! Maybe we need to add it at the WAI level, then?

Naveenaidu commented 3 years ago

@0x777 IIUC request_time is defined as follows:

request_time = time taken to read the request body +
                         time taken to execute that request and send the response back to client

While I was working on Server Metrics issue - I noticed that in our mkSpockAction function, we have ioWaitTime as well as serviceTime which can be used for calculating the request_time mentioned in this issue.

Referencing from the code:

* `withElapsedTime` is defined [here](https://github.com/hasura/graphql-engine-mono/blob/58b0033be2ac9c0c757319a1a5f735e53f08307b/server/src-lib/Hasura/Prelude.hs#L174-#L180)
```haskell
-- | Time an IO action, returning the time with microsecond precision. The
-- result of the input action will be evaluated to WHNF.
--
-- The result 'DiffTime' is guarenteed to be >= 0.
withElapsedTime :: MonadIO m => m a -> m (DiffTime, a)

If I understand all of the above correctly, will the following be a proper definition for $request parameter:

request = time taken to read the request body (ioWaitTime) + 
          time taken to execute that request and send the response back (serviceTime)

where

serviceTime = Service time, not including request IO wait time.
iotWaitTime = Request IO wait time, i.e. time spent reading the full request from the socket.

If the definition of request is correct as mentioned above, does this mean that this issue can now be progressed and we can finally log the $request_time logging parameter?

andrewloux commented 2 years ago

@0x777 Just wanted to check in on this - I'm just an invested user, wondering if there is anything I could do to contribute to push this along?

It seems like the data is there, just not exposed in the logging output at this time