pg_wait_sampling tracks queryId just for top-level queries

maksm90 commented 2 years ago

Recently there was discussion https://github.com/postgrespro/pg_wait_sampling/pull/42#issuecomment-1079635726 about unwanted usage of queryIds taken from PgBackendStatus entries in BackendStatusArray as they represent only top-level queries. But current implementation behave in similar manner: once queryId was kept all successive nested queries cannot submit their own queryIds.

The open question here - how to organize tracking queryIds for nested queries? @rjuju AFAICS in pg_stat_kcache this problem is also not solved - all inner queries rewrite queryId of previously executing queries before executor running. Maybe it makes sense to store queryId in some shared stack structure allocated for each process slot and using it keep/restore queryId for upper-level queries?

rjuju commented 2 years ago

In pg_stat_kcache we usually use the queryid retrieved from the QueryDesc so it should be correct. It's only for parallel workers that we rely on the array, and in that case it's not entirely clear which queryid will be used.

We store the queryid for the backend on executor start (the planner can run some queries but I don't think those can be parallel), so if it's the usual case of serialized queries it should be fine as it will retrieve what the last executor start put. If that's intertwined queries then it may be wrong. I think this can happen in some cases, for instance in plpgsql with nested cursor iteration. AFAIK there's nothing we can do there, as extension can register any custom data to pass down to a parallel worker :(

For pg_wait_sampling it's a bit different, as you get the info from an external collector so you always have to rely on the array in shmem, which is more problematic. Also, AFAICS the it's set during the planning so it's way more likely that to have the value out of sync. Also, I think it's not even guaranteed that you get the top-level queryid, as you could bypass the planner and execute a query directly. Most of the time you should get a null queryid as you reset it in the executor end hook, but there's no guarantee that each planning phase will be followed by an execution phase, so you can still sometimes inherit from a totally unrelated queryid.

maksm90 commented 2 years ago

Thanks for explanation and notes.

the planner can run some queries but I don't think those can be parallel

Could you provide an example of such behavior? AFAIK, as a maximum, planner can pre-evaluate expressions but it doesn't run executor within itself. Am I wrong?

If that's intertwined queries then it may be wrong. I think this can happen in some cases, for instance in plpgsql with nested cursor iteration.

This case I meant in my message: if parallel query calls function that issues an SQL query itself then parallel workers from upper-level query will capture queryid from function's SQL body. Would storing queryid values in stack help here? Or is it impossible to realize that at ExecutorEnd hook parallel worker does belong to the currently executing query (that might be from inner function) on leader worker?

Also, AFAICS the it's set during the planning so it's way more likely that to have the value out of sync.

Yes, currently pg_wait_sampling doesn't account parallel workers at all. In this regard, hook before executor start is required to preserve queryid from leader process.

Also, I think it's not even guaranteed that you get the top-level queryid, as you could bypass the planner and execute a query directly.

Yeah, hook before executor start have to help here.

Most of the time you should get a null queryid as you reset it in the executor end hook, but there's no guarantee that each planning phase will be followed by an execution phase, so you can still sometimes inherit from a totally unrelated queryid.

It only happens when we issue nested query from function call, doesn't it?

rjuju commented 2 years ago

the planner can run some queries but I don't think those can be parallel

Could you provide an example of such behavior? AFAIK, as a maximum, planner can pre-evaluate expressions but it doesn't run executor within itself. Am I wrong?

I don't think that it can run the executor, but it can e.g. do index scan for get_variable_range or call plpgsql functions, but those have their own processing and I don't think they can be parallel either. Now the planner can also call the planner itself, which is a problem since you will only keep the queryid from the last planner call during the whole query execution.

If that's intertwined queries then it may be wrong. I think this can happen in some cases, for instance in plpgsql with nested cursor iteration.

This case I meant in my message: if parallel query calls function that issues an SQL query itself then parallel workers from upper-level query will capture queryid from function's SQL body. Would storing queryid values in stack help here? Or is it impossible to realize that at ExecutorEnd hook parallel worker does belong to the currently executing query (that might be from inner function) on leader worker?

The problem is that the parallel worker can itself be a worker for a query at any sublevel. So you may know that the worker is itself nesting execution, but you won't know where it started from :(

Most of the time you should get a null queryid as you reset it in the executor end hook, but there's no guarantee that each planning phase will be followed by an execution phase, so you can still sometimes inherit from a totally unrelated queryid.

It only happens when we issue nested query from function call, doesn't it?

I don't think so. For instance imagine you have a planned statement s1 that has a cached plan already. Now if you do EXPLAIN SELECT 1, explain will plan the query but won't execute it. So you will have stored a queryid but never removed it since you never went through the executor. Now, if you execute your prepared statements s1 it will go through the execution without a planning, and you will now think that this statement has the queryid of SELECT 1.

maksm90 commented 2 years ago

The problem is that the parallel worker can itself be a worker for a query at any sublevel. So you may know that the worker is itself nesting execution, but you won't know where it started from :(

Yeah, I supposed this.

For instance imagine you have a planned statement s1 that has a cached plan already. Now if you do EXPLAIN SELECT 1, explain will plan the query but won't execute it. So you will have stored a queryid but never removed it since you never went through the executor. Now, if you execute your prepared statements s1 it will go through the execution without a planning, and you will now think that this statement has the queryid of SELECT 1.

Hmm, thanks a lot for the case.

maksm90 commented 2 years ago

The problem is that the parallel worker can itself be a worker for a query at any sublevel. So you may know that the worker is itself nesting execution, but you won't know where it started from :(

I have inspected the source code and found that at ExecutorStart() execution in parallel worker the leader worker waits on getting tuple in Gather (or GatherMerge) node, i.e. it waits up to parallel worker starts its execution and emitting rows. Therefore, I think, it's safe for parallel worker to capture queryId value from leader worker's shmem slot at the ExecutorStart hook in case of nested function calls because being in Gather node we know that the leader worker doesn't call function with nested query and doesn't change its queryId value.

maksm90 commented 2 years ago

I don't think that it can run the executor, but it can e.g. do index scan for get_variable_range or call plpgsql functions, but those have their own processing and I don't think they can be parallel either. Now the planner can also call the planner itself

Planner might call planner and executor routines itself within pre-evaluation of function calls. All criteria of such pre-evaluation is described in comment of evaluate_function(). The case of parallel execution inside planning, I think, is not problem if we begin to track queryId for planning and execution separately, i.e. begin to clear queryId after planner completion and to setup this value again before executor start.

Now the planner can also call the planner itself, which is a problem since you will only keep the queryid from the last planner call during the whole query execution.

I think we have to maintain stack of queryId values for nested planning too.

rjuju commented 2 years ago

Planner might call planner and executor routines itself within pre-evaluation of function calls. All criteria of such pre-evaluation is described in comment of evaluate_function(). The case of parallel execution inside planning, I think, is not problem if we begin to track queryId for planning and execution separately, i.e. begin to clear queryId after planner completion and to setup this value again before executor start.

Yes, and as a reference pg_stat_statements is also keeping track of the current planner nesting level since track_planning was added.

I think we have to maintain stack of queryId values for nested planning too.

do you mean storing an array of queryid in the backend private memory, for each nesting level, so that you can (re)publish the current queryid for each executor* functions? Note that we had to do something like that in pg_stat_kcache (https://github.com/powa-team/pg_stat_kcache/commit/48021a7860c2981c931b1b12f5ed158344c58159).

maksm90 commented 2 years ago

Now the planner can also call the planner itself, which is a problem since you will only keep the queryid from the last planner call during the whole query execution.

I think we have to maintain stack of queryId values for nested planning too.

do you mean storing an array of queryid in the backend private memory, for each nesting level, so that you can (re)publish the current queryid for each executor* functions? Note that we had to do something like that in pg_stat_kcache (powa-team/pg_stat_kcache@48021a7).

Not in private backend memory but in shared one to make top-level queryId (corresponding to currently planning/executing query) accessible to collector process.

rjuju commented 2 years ago

How will the collector know which nested level is active?

maksm90 commented 2 years ago

How will the collector know which nested level is active?

The latest level (top of stack) is active, isn't it? It means the query with the last inserted queryId now is planning or executing.

rjuju commented 2 years ago

Well, yes which is why I was assuming that the stack would be kept in the backend private memory, and it would just publish in current one for the collector.

If you store the stack in shared memory, you will also need to store an array of index to specify which level is valid, which seems like a waste.

maksm90 commented 2 years ago

If you store the stack in shared memory, you will also need to store an array of index to specify which level is valid, which seems like a waste.

What do you mean by not valid frame level?

Anyway, thanks for useful notes. I'll reconsider all pitfalls again and after introducing of regression tests try to provide some solution.

rjuju commented 2 years ago

I meant that "not the current nesting level", as only the backend knows where it currently is.

That is of course if you implement the stack on top of a simple array and a offset.

Medvecrab commented 3 months ago

Fixed in v1.1.6

postgrespro / pg_wait_sampling

pg_wait_sampling tracks queryId just for top-level queries #43