scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.53k stars 1.29k forks source link

Log messages do not mention username / query #9001

Open martin-sucha opened 3 years ago

martin-sucha commented 3 years ago

This is Scylla's bug tracker, to be used for reporting bugs only. If you have a question about Scylla, and not a bug, please ask it in our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

Installation details Scylla version (or git commit hash): 2021.1.2-0.20210620.7bb9428ae

Log messages should provide a way to link the logged message to the user / query that is related to the issue.

For example, I see the following message in the logs now:

[shard 35] mutation_partition - Memory usage of unpaged query exceeds soft limit of 1048576 (configured via max_memory_for_unlimited_query_soft_limit)

but I don't know which query exceeds the limit. I have tried probabilistic tracing and slow query log, but can't positively say whether any of the queries there is linked to this log message as the log message does not appear in the trace events.

I propose to add username and trace session id / span id (if available) to the log messages. The query string would be nice to have as well (but I'm not sure if it won't cause excessive volume of logs).

There are also other possible alternatives to my proposal, for example all log messages could be logged into the trace session as events.

tzach commented 3 years ago

Very good point @martin-sucha

avikivity commented 3 years ago

The query is not available at the point this is printed.

We have two options:

  1. return it via rpc to the coordinator, and print it there
  2. return it via rpc to the coordinator, attach a warning the the query result, and pass it to the driver

The second has the benefit of avoiding log spam in case the application goes crazy, and also directly informing the application. The downside is that the application may choose to ignore the warning. Maybe we should do both (and allow disabling 1).

avikivity commented 3 years ago

@haaawk does the Java driver (as an example) do anything with warnings returned from the database?

avikivity commented 3 years ago

/cc @denesb @psarna

tzach commented 3 years ago
  1. allow us to prorate the info to Loki (for example), and attach an alert to it.
psarna commented 3 years ago

/cc @denesb @psarna

Scylla Rust Driver returns warnings along with other auxiliary information like the paging state and tracing uuid, but it's up to the user to explicitly check if any warnings were present. The warnings are represented as a list of strings, but it's the user's decision whether they want to add code that prints or reports them anywhere, so I assume they usually don't.

denesb commented 3 years ago

Maybe we should have a mode parallel to slow query tracing, where we trace those queries that have any warnings attached to them.

haaawk commented 3 years ago

@haaawk does the Java driver (as an example) do anything with warnings returned from the database?

The driver logs the warning with WARN logging level unless it's disabled: https://github.com/scylladb/java-driver/blob/scylla-3.x/driver-core/src/main/java/com/datastax/driver/core/RequestHandler.java#L262

martin-sucha commented 3 years ago

We have two options:

  1. return it via rpc to the coordinator, and print it there
  2. return it via rpc to the coordinator, attach a warning the the query result, and pass it to the driver

The second has the benefit of avoiding log spam in case the application goes crazy, and also directly informing the application. The downside is that the application may choose to ignore the warning. Maybe we should do both (and allow disabling 1).

There should definitely be a way to find out which user is causing the error messages on the server side (so the database administrator can do it). With multiple database users (that use different drivers) if I don't know which client is causing the issue then I don't know which team should I ask to check their application.

does the Java driver (as an example) do anything with warnings returned from the database?

For example, gocql allows to get the warning strings, but does not print them by default and the C++ driver seems to do the same.

Maybe we should have a mode parallel to slow query tracing, where we trace those queries that have any warnings attached to them.

This looks like it would be useful. While adding the logs as trace events could work even with probabilistic tracing (although in my case the log message is logged every 5-10 minutes, so I'm not sure what is the chance that it would be randomly traced), specifically tracing those queries would increase the chance that something useful will be stored in the traces.

psarna commented 3 years ago

/cc @denesb @psarna

Scylla Rust Driver returns warnings along with other auxiliary information like the paging state and tracing uuid, but it's up to the user to explicitly check if any warnings were present. The warnings are represented as a list of strings, but it's the user's decision whether they want to add code that prints or reports them anywhere, so I assume they usually don't.

@piodul corrected me, we do log warnings, but currently unconditionally: https://github.com/scylladb/scylla-rust-driver/blob/efb170182f5d1e0b0a467077c627513a7ee47d40/scylla/src/transport/connection.rs#L534-L536

avikivity commented 3 years ago

Maybe we should have a mode parallel to slow query tracing, where we trace those queries that have any warnings attached to them.

By the time the warnings are attached, it's too late. The trace session is created before any warnings can be created.

avikivity commented 3 years ago

We have two options:

  1. return it via rpc to the coordinator, and print it there
  2. return it via rpc to the coordinator, attach a warning the the query result, and pass it to the driver

The second has the benefit of avoiding log spam in case the application goes crazy, and also directly informing the application. The downside is that the application may choose to ignore the warning. Maybe we should do both (and allow disabling 1).

There should definitely be a way to find out which user is causing the error messages on the server side (so the database administrator can do it). With multiple database users (that use different drivers) if I don't know which client is causing the issue then I don't know which team should I ask to check their application.

does the Java driver (as an example) do anything with warnings returned from the database?

For example, gocql allows to get the warning strings, but does not print them by default and the C++ driver seems to do the same.

I think these defaults should be changed to "log warnings (with rate-limit), unless an API method was called to read the warnings". Otherwise, this feature will only be used in cqlsh.

tzach commented 3 years ago

I think these defaults should be changed to "log warnings (with rate-limit), unless an API method was called to read the warnings". Otherwise, this feature will only be used in cqlsh.

+1 We need to propagate more info to allow users to pin point an issue to a query. In addition to just username, the connection info (external IP, port), and a CQL statement will be useful To mitigate the extra info we can propagate the session "id" and correlate the logs elsewhere.

avikivity commented 3 years ago

Those defaults are in the client driver, not Scylla. If the driver prints out the warnings, you immediately know what the session is. The driver also has access to the statement.