Include operationName in "http-log" to allow for better observability

chillu commented 2 years ago

Is your proposal related to a problem?

We'd like to identify anomalies and performance issues on specific Hasura operations via our own observability tooling (Heroku log draining to Coralogix), on the community edition. Heroku splits log lines after 10kb, which prevents them from being fully parsed in the observability platform (each log line is invalid JSON). Our queries are quite large, so we frequently exceed this threshold. We don't need the verbosity of logging the entire query, but its the only way to get to detail.query.operationName in the query-log format.

Additionally, http-log does include the whole query on level: error, which causes the same log truncation behaviour in Heroku on our large queries.

Describe the solution you'd like

Include the operationName in the http-log format. Allow disabling of inclusion of query in {"type": "http-log", "level": "error"} via a CLI arg and env var.

Example info level log with `operationName` (expand)

```json { "timestamp": "2019-05-30T23:40:24.654+0530", "level": "info", "type": "http-log", "detail": { "request_id": "072b3617-6653-4fd5-b5ee-580e9d098c3d", "operation": { "operationName": "MyOperation", "query_execution_time": 0.009240042, "user_vars": { "x-hasura-role": "user" }, "error": null, "request_id": "072b3617-6653-4fd5-b5ee-580e9d098c3d", "parameterized_query_hash": "7116865cef017c3b09e5c9271b0e182a6dcf4c01", "response_size": 105, "query": null }, "http_info": { "status": 200, "http_version": "HTTP/1.1", "url": "/v1/graphql", "ip": "127.0.0.1", "method": "POST" } } } ```

Example error level log with `--log-query-in-http-log-errors=false` server flag (expand)

```json { "timestamp": "2019-05-29T15:22:37.834+0530", "level": "error", "type": "http-log", "detail": { "operation": { "query_execution_time": 0.000656144, "user_vars": { "x-hasura-role": "user", "x-hasura-user-id": "1" }, "error": { "path": "$.selectionSet.profile.selectionSet.usernamex", "error": "field \"usernamex\" not found in type: 'profile'", "code": "validation-failed" }, "request_id": "072b3617-6653-4fd5-b5ee-580e9d098c3d", "response_size": 142, "query": { "operationName": "getProfile", } }, "http_info": { "status": 200, "http_version": "HTTP/1.1", "url": "/v1/graphql", "ip": "127.0.0.1", "method": "POST" } } ```

Describe alternatives you've considered

We don't have control over Heroku log splitting, or our query length (persisted queries aren't an option in Hasura either).

Alternative option: Allow max query field size (truncation) in the query-log and http-log formats, to retain valid JSON in a single log line for log drains that auto-truncate log lines. The problem here is that other fields like the generated SQL will push over the log truncation limits as well.

Example with `[truncated]` hint and a new `--log-query-max-length=40` server flag (expand)

```json { "timestamp": "2019-06-03T13:25:10.915+0530", "level": "info", "type": "query-log", "detail": { "kind": "database", "request_id": "840f952d-c489-4d21-a87a-cc23ad17926a", "query": { "variables": { "limit": 10 }, "operationName": "getProfile", "query": "query {\n author {\n name\n email\n do [truncated]" }, "generated_sql": { "profile": { "prepared_arguments": ["{\"x-hasura-role\":\"admin\"}", "%a%"], "query": "SELECT coalesce(json_agg(\"root\" ), '[]' ) AS \"root\" FROM (SELECT row_to_json((SELECT \"_1_e\" FROM (SELECT \"_0_root.base\".\"username\" AS \"username\" ) AS \"_1_e\" ) ) AS \"root\" FROM (SELECT * FROM \"public\".\"profile\" WHERE ((\"public\".\"profile\".\"username\") LIKE ($2)) ) AS \"_0_root.base\" LIMIT 10 ) AS \"_2_root\" " }, "myusername": { "prepared_arguments": ["{\"x-hasura-role\":\"admin\"}", "foobar"], "query": "SELECT coalesce(json_agg(\"root\" ), '[]' ) AS \"root\" FROM (SELECT row_to_json((SELECT \"_1_e\" FROM (SELECT \"_0_root.base\".\"username\" AS \"username\" ) AS \"_1_e\" ) ) AS \"root\" FROM (SELECT * FROM \"public\".\"profile\" WHERE ((\"public\".\"profile\".\"username\") = ($2)) ) AS \"_0_root.base\" ) AS \"_2_root\" " } } } } ```

If the feature is approved, would you be willing to submit a PR?

I've never written any Haskell. It'll be an interesting fix to get started, with a few pointers in the right direction.

bostrom commented 2 years ago

We've also noticed that when both http-log and query-log are included in the HASURA_GRAPHQL_ENABLED_LOG_TYPES, then the http-log also outputs the operation name, variables and GraphQL query, even for successful operations (as opposed to what the docs claim.

This is useful information, but redundant since it's already included in the query-log. When disabling the query-log, then the GraphQL query information disappears from the http-log (although maybe still included for errors, as per the docs).

Enabling the query-log, however, will quickly fill up our disk quota for logs, since it includes the full generated SQL query.

So what bugs us, is that we'd like information about the operation name and variables (perhaps also full GQL query) and execution times in order to analyse the performance in production. But the only way to get that now is to enable the query-logs, which will use up our disk quota and leave us only with a few hours of logs.

Our proposal would be to either

fix the redundancy in http-log and query-log (don't output the query stuff in http-log when query-log is enabled) and introduce a flag for disabling the SQL query in query-log

OR

always include the GQL operation information in the http-log even for successful operations and when the query-log is disabled

ro-savage commented 8 months ago

We reached out to Hasura about this issue, as an enterprise customer.

Their response on 11th of December was:

Thanks for your feedback, I've passed it to the relevant team. We got the message from the team that it makes sense to reduce the size of the log based on the parameters supplied to it via env vars or any other form. The ETA we got was around 2 months. Please be look out on that referenced github issue which you mentioned.

hasura / graphql-engine