Open BugenZhao opened 1 year ago
So this means we need to refactor all logs?
So this means we need to refactor all logs?
I think so, at least for those useful for debugging.
+1
Using different log config for multiple deploying env is quite common.
So this means we need to refactor all logs?
From my experience, it's almost impossible to enforce every log to be structured. However, we can enforce it for the most critical logs and newly-added ones. It still helps.
From my experience, it's almost impossible to enforce every log to be structured. However, we can enforce it for the most critical logs and newly-added ones. It still helps.
Agree. It's more feasible to start on critical ones and then refactor the others gradually.
Besides, structured logs make everything based on logs easier, such as analysis on SQL logs.
From my experience, it's almost impossible to enforce every log to be structured. However, we can enforce it for the most critical logs and newly-added ones. It still helps.
Agree. It's more feasible to start on critical ones and then refactor the others gradually.
Besides, structured logs make everything based on logs easier, such as analysis on SQL logs.
But I'm concern about if the log collector/store recognizes hybrid format since we won't unify all the logs at once. Do you have any idea? @Nebulazhang
The format won't be "hybrid". They are always in JSON if the env var RISINGWAVE_CLOUD
is set after #10633.
For example, if a log event is structural, we may get something like...
tracing::debug!(actor_id, ?state, "take snapshot")
{
"timestamp": "2023-06-28T17:17:35.404904+08:00",
"level": "DEBUG",
"fields": {
"message": "take snapshot",
"actor_id": 42, <- 👀
"state": "[Nexmark(NexmarkSplit { split_index: 2, split_num: 4, start_offset: Some(315394) })]" <- 👀
},
"target": "risingwave_stream::executor::source::source_executor"
}
But if it's simply formatted into the message, we'll only get the message
field.
tracing::debug!("take snapshot for actor {} with state: {:?}", actor_id, state)
{
"timestamp": "2023-06-28T17:17:35.404904+08:00",
"level": "DEBUG",
"fields": {
"message": "take snapshot for actor 42 with state: [Nexmark(NexmarkSplit { split_index: 2, split_num: 4, start_offset: Some(315394) })]"
},
"target": "risingwave_stream::executor::source::source_executor"
}
Although it may be harder to analyze these unstructural logs, this format should still be considered consistent with the structural one.
Are there any actionable sub-issues?
Are there any actionable sub-issues?
I have no idea currently. Maybe we need some practices of log analyzing in production to determine what needs prioritization.
This issue has been open for 60 days with no activity.
If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity
label.
You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄
No actionable tasks now. BTW, we can add a section in the developer guide https://github.com/risingwavelabs/risingwave/issues/17436
Currently when printing logs with
tracing
, we tend toformat
the runtime information into the message like this:https://github.com/risingwavelabs/risingwave/blob/4f3bce2d024f958d1eeebc49a5556a81372fe824/src/meta/src/rpc/election_client.rs#L107-L122
Although intuitive, this approach lacks observability friendliness. It is hard to retrieve the
lease_id
from these messages after they're formatted, which makes it also challenging for us to build some analytics or alerting tools by consolidating related logs, traces, and metrics.Instead, we should prefer structural logs instead. This is enabled by "recording fields" with
tracing
. It looks like...https://github.com/risingwavelabs/risingwave/blob/4f3bce2d024f958d1eeebc49a5556a81372fe824/src/utils/pgwire/src/pg_protocol.rs#L345-L352
By explicitly specifying fields, the tracing subscriber will be able to decide how to handle them.
cc @fuyufjh @arkbriar @Nebulazhang