[YSQL] Positional and Directional FETCH Not Yet Supported

Jira Link: DB-4854 The following FETCH options are not yet supported.

FETCH NEXT
FETCH PRIOR
FETCH FIRST
FETCH LAST
FETCH ABSOLUTE
FETCH RELATIVE
FETCH + OR -
FETCH FORWARD
FETCH BACKWARD

On the use of "ysql_suppress_unsupported_error=true"

All of the tests that are described in these comments:

issue-6514.zip — Bllewell summary
issue-6514-2.zip — Bllewell summary
issue-6514-3.zip — Bllewell summary

and that are implemented in the attached .zip files were run after starting my YB cluster with this YSQL configuration parameter setting:

ysql_suppress_unsupported_error=true

The default for this parameter is false. And when the default setting is used, every 0A000 occurrence, that in my tests is a warning and that I suppress by setting _client_minmessages to error, is a genuine error—meaning that any statement that causes 0A000 rolls back and has no effect.

issue-6514.zip — Bllewell summary

NOTE: the list that starts this report is wrong. Here are the statements that don't draw a warning (when _client_minmessages is set to warning):

fetch             from cur;
fetch next        from cur;
fetch forward     from cur;
fetch :N          from cur;
fetch forward :N  from cur;

where :N is an integer that's greater than zero.

This issue needs some testcases. In particular, it doesn't mention limitations in PL/pgSQL cursor functionality that stem from the underlying SQL functionality limitations.

I just tested everything using YB-2.17.0.0. My tests are attached.

It actually turns out that every operation that's listed above as "not yet supported" runs quietly without error and produces the expected result as long as you do this:

set client_min_messages = error;

This suggests that, with another round of careful testing, this warning, and others like it for other variants of fetch, can simply be removed:

0A000: FETCH BACKWARD not supported yet

Similarly, move works fine when warnings are suppressed:

set client_min_messages = error;
create function s.f(n in int)
  returns int
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  cur refcursor := 'cur';
  vv int;
begin
  open cur for select v from s.t order by v;
  move absolute n in cur;
  fetch cur into vv;
  return vv;
end;
$body$;

select s.f(4);

Notice that, when _client_minmessages is warning, even create function when the body has a move statement draws the warning:

0A000: MOVE not supported yet

You get it again at runtime. But you also get the right result.

Create the testcase objects

The following shows the testcase concept. However, the real testcase (attached) uses a fine-grained control over _client_minmessages to set it to error only when this is needed.

Connect as an ordinary user to a database upon which it has the create privilege. First create and populate a suitable test table.

-- Notice that the entire testcase runs with this setting.
set client_min_messages = error;

drop schema if exists s cascade;
create schema s;
create table s.t(k serial primary key, v int not null);
insert into s.t(v) select g.v from generate_series(5, 25) as g(v);

Next, create a procedure that declares a session-duration cursor. (There's no explicit PL/pgSQL statements that allow this. But dynamic SQL manages it without a problem.) The advantage of using a session-duration cursor for these tests is simply that you can create it once and forget about it without having to do everything within an explicitly transaction. (An an explicitly transaction makes ad hoc queries tedious because even a trivial typo breaks the txn so that you have to start again from scratch.)

create procedure s.declare_cur()
  set search_path = pg_catalog, pg_temp
  security definer
  language plpgsql
as $body$
begin
  begin
    execute 'close cur;';
  exception
    when invalid_cursor_name then null;
  end;
  execute
    'declare cur scroll cursor with hold '||
    'for select v::text from s.t where v between 11 and 20 order by v;';
end;

Next, create a function that encapsulates the various fetch operations, using (according to the lang formal argument) either raw SQL (via the execute statement) or the dedicated PL/pgSQL fetch statement variants.

create function s.fetched_row(lang in text, kind in text, cur in refcursor)
  returns text
  set search_path = pg_catalog, pg_temp
  security definer
  language plpgsql
as $body$
declare
  v text;
begin
  case lang
    when 'sql' then
      execute format('fetch %s from cur', kind) into v;

    when 'plpgsql' then
      case kind
        when 'first' then
          fetch first from cur into v;
        when 'next' then
          fetch next from cur into v;
        when 'absolute 5' then
          fetch absolute 5 from cur into v;
        when 'relative 3' then
          fetch relative 3 from cur into v;
        when 'relative 0' then
          fetch relative 0 from cur into v;
        when 'relative -5' then
          fetch relative -5 from cur into v;
        when 'last' then
          fetch last from cur into v;
        when 'prior' then
          fetch prior from cur into v;
      end case;
  end case;

  return case
           when v is null then null
           else                rpad(kind, 13)||': '||v
         end;
end;
$body$;

Next, create a function that encapsulates the various PL/pgSQL move statement variants, matching the variants one-to-one with the fetch variants.

create function s.move_report(kind in text, cur in refcursor)
  returns text
  set search_path = pg_catalog, pg_temp
  security definer
  language plpgsql
as $body$
begin
  case kind
    when 'first' then
      move first in cur;
    when 'next' then
      move next in cur;
    when 'absolute 5' then
      move absolute 5 in cur;
    when 'relative 3' then
      move relative 3 in cur;
    when 'relative 0' then
      move relative 0 in cur;
    when 'relative -5' then
      move relative -5 in cur;
    when 'last' then
      move last in cur;
    when 'prior' then
      move prior in cur;
  end case;
  return kind;
end;
$body$;

Finally, create a driver table function that exercises all the variants according to its lang formal argument. When lang is sql, it exercises all of the SQL fetch variants but, of course, there's no "move". When lang is plpgsql, it exercises all of the PL/pgSQL fetch variants and all of the PL/pgSQL move variants.

create function s.cur_rows_sql(lang in text, cur in refcursor)
  returns table(z text)
  set search_path = pg_catalog, pg_temp
  security definer
  language plpgsql
as $body$
declare
  v text;
  kind_first      constant text not null := 'first';
  kind_next       constant text not null := 'next';
  kind_abs_5      constant text not null := 'absolute 5';
  kind_rel_3      constant text not null := 'relative 3';
  kind_rel_0      constant text not null := 'relative 0';
  kind_rel_neg_5  constant text not null := 'relative -5';
  kind_last       constant text not null := 'last';
  kind_prior      constant text not null := 'prior';
begin
  z := s.fetched_row(lang, kind_first, cur);                return next;
  z := '-----';                                             return next;

  loop
    z := s.fetched_row(lang, kind_next, cur);
    exit when z is null;
    /* */                                                   return next;
  end loop;
  z := '-----';                                             return next;

  z := s.fetched_row(lang, kind_abs_5,     cur);            return next;
  z := s.fetched_row(lang, kind_rel_3,     cur);            return next;
  z := s.fetched_row(lang, kind_rel_0,     cur);            return next;
  z := s.fetched_row(lang, kind_rel_0,     cur);            return next;
  z := s.fetched_row(lang, kind_last,      cur);            return next;
  z := s.fetched_row(lang, kind_rel_neg_5, cur);            return next;
  z := '-----';                                             return next;

  loop
    z := s.fetched_row(lang, kind_prior, cur);
    exit when z is null;
    /* */                                                   return next;
  end loop;

  if lang = 'plpgsql' then
    z := '----- "move" tests (plpgsql only)';               return next;
    begin
      z := s.move_report(kind_first, cur);                  return next;
      z := s.fetched_row(lang, kind_next, cur);             return next;

      z := s.move_report(kind_abs_5, cur);                  return next;
      z := s.fetched_row(lang, kind_next, cur);             return next;

      z := s.move_report(kind_rel_3, cur);                  return next;
      z := s.fetched_row(lang, kind_next, cur);             return next;

      z := s.move_report(kind_last, cur);                   return next;
      z := s.move_report(kind_rel_neg_5, cur);              return next;
      z := s.fetched_row(lang, kind_next, cur);             return next;
    end;
  end if;
end;
$body$;

Run the testcase

Run it first in YB and then in vanilla PG. Spool the output (using the "\o" meta-command), using suitable names like yb.txt and pg.txt, so that you can diff them.

First, create the session-duration cursor and execute the few SQL fetch variants that get many rows each. (These cannot be used from PL/pgSQL.)

call s.declare_cur();

fetch 3             from cur;
fetch forward 2     from cur;
fetch forward all   from cur;
fetch backward 4    from cur;
fetch backward all  from cur;

Now invoke the _s.cur_rowssql() encapsulation using both its sql mode and its plpgsql mode:

select z from s.cur_rows_sql('sql',     'cur');
select z from s.cur_rows_sql('plpgsql', 'cur');

Diffing the two spool files shows that they are identical.

The attached testcase

This is presented as a single file, issue-6514.sql inside the attached issue-6514.zip. It's totally self-contained and you can simply start it at the ysqlsh or the psql prompt—with one caveat: it starts with this:

drop schema if exists s cascade;

So, of course, you'll have to improvise a bit if the database that you connect to happens already to have a schema called s that's owned by a role other than the one as which you connect. Notice that the script uses this device to spool to a file that reflects whether you run the test in YB or PG:

select (version() like '%YB%')::text as is_yb
\gset
\if :is_yb
  \o yb.txt
\else
  \o pg.txt
\endif

The attached issue-6514.zip also contains the spool files yb.txt and pg.txt that were generated just before making the .zip.

Here, the code decorates each output line with an asterisk if it required warnings to be suppressed. This means that diff will indeed shown differences because, of course, you don't get any warnings in PG and so there's no need to suppress them. However, it's very easy to read over these asterisk as see that the actual returned values are identical in PB and PG.

What's expected when you scroll forwards and backwards in a cursor? And what's expected when the population of the result set that it defines is changed during the cursor's lifetime?

What follows provides some background for understanding how things are supposed to work.

Introduction

nocaway asked me a question in a YB-internal Slack exchange:

What's expected, in the face of repeated forwards and backwards scrolling, especially when the subquery that defines a cursor doesn't have an order by clause?

He didn't ask a second question. But he should have:

What's expected when a second session commits changes that would change:
- either which rows the cursor's result set contains;
- or the values of columns within that result set?

I can't find explicit answers to these questions in the PG doc. But the answers are implied by a general understanding of how query execution works and of how SQL statements define outcomes in the face of concurrent activity (where the effect of one session's changes affects the set of rows another session's SQL statement sees).

Mental model for move and fetch, forwards and backwards, for a non-holdable cursor

Here's a simple mental model for what it means to move and fetch, forwards and backwards at will in or from a cursor. Use this table as the target for the cursor's defining subquery:

create table s.t(k int primary key, v int not null);
select setseed(0.0::float8);
with s(v) as (select generate_series(1, 10))
insert into s.t(k, v)
select v, round(100*random()) from s;

Now inspect the seventh row in the result set for an unrestricted subquery with no order by against s.t. (Assume that you can be sure that no other sessions can change the contents of s.t during the test.) It doesn't matter that, as the rules of SQL promise, you cannot in general predict which row it will be. Even so, within the tiny timespans that matter in this test, repeating this query, time and again, will always get the same result—within a particular database in a particular PG or YB cluster.

with c(rnk, v) as (
  select row_number() over(), v from s.t)
select rnk, v from c where rnk = 7;

The random() function always generates the same sequence of values in successive invocations that follow invoking setseed() with the same actual argument. This is the result (using YB):

 rnk | v  
-----+----
   7 | 80

(It's different, of course, in PG because of the different physical storage systems that YB and PG use.)

Now declare a cursor and also "snapshot" the result set of its defining subquery into the table _pg_temp.cursorsimulation:

create table pg_temp.cursor_simulation(rnk int not null, v int not null);
start transaction;
  declare cur scroll cursor without hold for
  select row_number() over() as rnk, v from s.t;

  insert into pg_temp.cursor_simulation(rnk, v)
  select row_number() over(), v from s.t;

Once again, we can be sure that the subquery that defines both the cursor and the content of the _pg_temp.cursorsimulation table will produce the same result set each of the two times that it's executed. Inspect the seventh row here too:

  fetch absolute 7 from cur;
  select rnk, v from pg_temp.cursor_simulation where rnk = 7;

The result is the same for both methods—and is the same as the free-standing subquery got immediately before start transaction.

Finally, traverse the entire result set, in both directions, using first the cursor and then the _pg_temp.cursorsimulation table:

  move last in cur;
  move next in cur;
  fetch backward all from cur;
  move prior in cur;
  fetch forward all from cur;

and:

  select rnk, v from pg_temp.cursor_simulation order by rnk desc;
  select rnk, v from pg_temp.cursor_simulation order by rnk asc;

The results are the same for both approaches, thus:

 rnk | v  
-----+----
  10 | 78
   9 | 77
   8 | 39
   7 | 80
   6 | 55
   5 | 28
   4 | 34
   3 | 20
   2 | 84
   1 | 91

and:

 rnk | v  
-----+----
   1 | 91
   2 | 84
   3 | 20
   4 | 34
   5 | 28
   6 | 55
   7 | 80
   8 | 39
   9 | 77
  10 | 78

In summary, this experiment informs a simple mental model:

Moving around in, and fetching from, a cursor (whether or not its defining subquery has an order by) is exactly analogous to fetching rows from a temporary table into which you insert using the cursor's defining subquery (adding a column _"rownumber() over() as rnk" ) at exactly the same moment that the cursor is declared. Of course, if the cursor's subquery uses order by, you'd also write that clause within the parentheses of over(). You address a row in the cursor with move absolute :N and fetch relative 0; and you address the corresponding row in the temporary table by selecting it using the restriction where rnk = :N.

Mental model for the immunity of the result set that a cursor defines to committed table changes from other concurrent sessions

Ensure that you issue rollback to finish the previous test. Set up for the next test as for the previous one. But make the choice of isolation level explicit.

start transaction isolation level read committed;
  declare cur scroll cursor without hold for
  select row_number() over() as rnk, v from s.t;

Now do this variant on move and fetch from the previous test:

  move absolute 0 in cur;
  fetch forward all from cur;

This is the result:

 rnk | v  
-----+----
   1 | 91
   2 | 84
   3 | 20
   4 | 34
   5 | 28
   6 | 55
   7 | 80
   8 | 39
   9 | 77
  10 | 78

So far, make sure that the session where you do this is the only session that's connected to the present database. Then connect to the same database as the same user in a second session and do this:

update s.t set v = v + 1000;

Make sure that, in this second session, the plain \set metacommand shows this usual setting:

AUTOCOMMIT = 'on'

Back in the first session, repeat the move and fetch from the cursor:

  move absolute 0 in cur;
  fetch forward all from cur;

If you do this test using PG, then you'll see that the results haven't changed—even though a second session has committed changes to the table that the cursor's subquery references and the first session is in flight in a read committed transaction. Emphasize the point by doing this (still within the same ongoing transaction):

select k, v from s.t order by k;

This does show the committed changes that the second session made, consistent with the proper semantics of the read committed isolation level.

 k  |  v   
----+------
  1 | 1084
  2 | 1039
  3 | 1078
  4 | 1080
  5 | 1091
  6 | 1020
  7 | 1034
  8 | 1077
  9 | 1028
 10 | 1055

So far, the results are identical, when you do the steps in this exact order, in both PG and in YB. However (because of a YB bug—see below) you'll see the intended behavior only in PG if you do the second session's update statement before the first move and fetch in the first session.

This is the intention:

A cursor will see the result set for its defining subquery as it exists at the moment that the cursor is created (i.e. declared).

Now stress test this rule by changing the order of the steps, thus:

Immediately after declaring the cursor, and _before doing the move absolute 0 in it and the fetch forward all from it, execute the update s.t set v = v + 1000 statement in the second session. And only now go back to the first session and do the move and fetch.

When you do this in PG, the outcome is exactly the same as when you use the first order. But when you do in in YB, the move and fetch get a different outcome, thus:

 rnk |  v   
-----+------
   1 | 1091
   2 | 1084
   3 | 1020
   4 | 1034
   5 | 1028
   6 | 1055
   7 | 1080
   8 | 1039
   9 | 1077
  10 | 1078

Now repeat the update statement in the second session and come back to the first session and repeat the move and fetch. Now the results are unchanged. In summary:

In PG, when you fetch from a cursor, you see the result set as it was defined at the moment that the cursor was created.
But in YB, when you fetch from a cursor, you see the result set as it was defined at the moment that you first fetched from the cursor.

We must regard the PG behavior as tautologically correct. And we must therefore conclude this:

YB exhibits a significant semantic bug when fetching from a cursor in the presence of concurrent changes in the tables that the cursor's defining subquery selects from.

Of course, for the use cases for which YB is intended, concurrent data changes within the relevant tables is the common case.

How are move and fetch implemented, for a non-holdable cursor, when the execution plan cannot be run backwards?

The execution plan for the cursor's defining subquery can sometimes be executed correctly in both the forwards and the backwards directions. It's easy to picture this:

if you assume that a table's metadata records the addresses (in the storage) of its currently first and last rows, and that each row between the first and the last records the address of its predecessor and successor rows;
and if you assume that the subquery references just a single table and that the execution plan does not use an index.

Here, the scheme that supports a forwards sequential scan over the table's rows can just as well support a backwards scan. Many execution plans are reversible in this way. (Experts on the plpgsql-general email list have suggested that this is the common case.) However, not all execution plans are reversible. Here is a simple but compelling example:

create function s.series()
  returns table(v int)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info 'series() invoked';
  for v in (select generate_series(1, 11))loop
    return next;
  end loop;
end;
$body$;

create function s.filter(v in int)
  returns boolean
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info '%',  'filter('||to_char(v, '99')||') invoked';
  return (v < 10) and (v != all(array[1, 2, 4, 7]));
end;
$body$;

Now do this at the psql (or ysqlsh) prompt:

start transaction;
  declare cur scroll cursor without hold for
  select v from s.series() where s.filter(v);

It finishes silently and without error. Notice that the function s.f() is opaque to the planner. It simply returns its result set in the order that each next row is computed. And it's impossible for the planner to execute it any other way. This means that, in order to support the operations like fetch prior that declaring the cursor as scrollable allows, the results (at least the unfiltered results from the _generateseries() function) must be cached. I can't find this detail in the PG doc. But PG expert Laurenz Albe mentions it in his blog post WITH HOLD cursors and transactions in PostgreSQL.

Start with this to see the tests that are made on the output of _generateseries() to restrict it to the result set that the cursor's subquery defines:

move absolute 0 in cur;
fetch all from cur;

The move absolute 0 statement produces no SQL output and no raise info output. (This is because its meaning, "go to just before where the data starts", is fixed. The fetch all statement produces this SQL output:

And raise info reports this (the <-- annotations were added manually):

series() invoked
filter(  1) invoked
filter(  2) invoked
filter(  3) invoked   <--
filter(  4) invoked
filter(  5) invoked   <--
filter(  6) invoked   <--
filter(  7) invoked
filter(  8) invoked   <--
filter(  9) invoked   <--
filter( 10) invoked
filter( 11) invoked

Now do this:

  move relative -2 in cur;

It produces no SQL output. And raise info reports this:

filter( 11) invoked
filter( 10) invoked
filter(  9) invoked
filter(  8) invoked

Notice that this time, we do not see series() invoked in the raise info output. This, and the results from the previous move and fetch, together tell us that the unfiltered output from _generateseries() is cached—and now it's held in a structure that has frozen the order (whatever it was) in which the rows were produced. And it tells us that the where clause is applied anew to the cached raw data on every fetch or move invocation. If you create this function:

create function select_list_fn(i in int)
  returns int
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info '%',  'select_list_fn('||to_char(i, '99')||') invoked';
  return i;
end;
$body$;

and invoke it in the select list, then you'll see that it, too, is applied after the fact to the cached raw data.

Given these facts (that just the raw data is cached and that select list and where clause functions are applied to the cached data at fetch time, I cannot see how to design an experiment that would show whether or not ordinary table data is, or is not cached on the first fetch.

This detail seems to be undocumented and also seems to be pointlessly inefficient. (Contrast it with what happens, see below, when the cursor is declared as holdable.) I asked about it on the pgsql-general list HERE. But I didn't get a useful reply.

How are move and fetch implemented—for a holdable cursor?

Here, it doesn't matter whether or not the execution plan is reversible because, by definition of holdable, the result set is materialized at cursor declaration time. Rollback to finish the example from the previous section and do this:

declare cur scroll cursor with hold for
select v from s.series() where s.filter(v);

We see no SQL output. But we do see raise info output already now, thus:

INFO:  series() invoked
INFO:  filter(  1) invoked
INFO:  filter(  2) invoked
INFO:  filter(  3) invoked
INFO:  filter(  4) invoked
INFO:  filter(  5) invoked
INFO:  filter(  6) invoked
INFO:  filter(  7) invoked
INFO:  filter(  8) invoked
INFO:  filter(  9) invoked
INFO:  filter( 10) invoked
INFO:  filter( 11) invoked

Now use the same move and fetch that we used in the previous section:

move absolute 0 in cur;
fetch all from cur;

It produces the same SQL result as in the without hold text, as expected:

But we see no raise info output. This tells us that—in contrast to how it's done for a without hold cursor—here (for a with hold cursor), it's the already-filtered result set that's cached (at the moment that the cursor is declared). If you create the function _select_listfn(i in int) and include this in the cursor's subquery, then you'll see that this, too, is taken account of at cursor declaration time to define the values that are cached.

There seems to be a very strange and arbitrary difference between how caching is done for the with hold cursor and the without hold cursor. There seems to be no rationale for deferring the use of select list and where clause functions at caching time and then applying them after the fact to the cached data. This difference in approach relatively harms the performance of the without hold case but makes no useful semantic difference. Apparently, the without hold cursor is more commonly used than is the with hold cursor. (Of course, you would see differences in output if the filter() function or the _select_listfn) function were not immutable. But using such a function in a SQL statement is anyway unsafe.)

Finally, because the entire result set for the cursor's subquery is materialized at declaration time, it is bound to be properly immune to the effects of changes made to the contents of the tables that the subquery references during the cursor's lifetime.

issue-6514-2.zip — Bllewell summary

The attached issue-6514-2.zip contains a self-contained testcase that explores the degrees of freedom suggested by:

the proposal from nocaway in a Slack exchange to use a table thus:

create table s.h_k__v(
h int not null, k int not null, v int not null,
primary key(h, k));

instead of the table (paraphrased here):

create table s.k__v(
k int not null, v int not null,
primary key(k));

that the testcase issue-6514.zip used.

The ideas explained above in the section "What's expected when you scroll forwards and backwards in a cursor? And what's expected when the population of the result set that it defines is changed during the cursor's lifetime?".

This led to a complicated testcase:

It uses seven different cursor declarations that, taken together, use the two tables s.k__v and _s.hk__v, various where and order by clauses, and both without hold and with hold cursors.
It uses five different concurrent DML combinations (see below).

This brings a total of thirty-five different tests. Moreover, some of the tests required interleaving table updates from a second session. So (without adding the complication of, say, a Python driver program) this implies following steps manually in two concurrent sessions.

Here are the high-level conclusions:

You get wrong results for all cursor definitions when the target table is updated between declaring the cursor and first fetching from it. And this happens both when the updates are issued in the same session that does the declare and fetch; and when the updates are issued from a second concurrent session. I made a small testcase that demonstrates the wrong results even when only supported fetch variants are used and I filed a new issue with that testcase:

Issue #16832 Wrong "INSENSITIVE" semantics for "FETCH FROM CURSOR"
You get two kinds of wrong results (on top of what the first bullet states) for this (but only this) cursor-defining subquery:
```
select ... from h_k__v where h = 1 order by k
```
Here, you always see the wrong rows when particular move and fetch sequences are used; and you see that updates done from a second session even after move and fetch have been used a few times, affect the values in the fetched rows—but only when you use a without hold cursor.

NOTE: nocaway also proposed testing with cursor subqueries that have no order by. Such cursor declarations are included in the kit but are not used in the tests that are described. You can try them yourself by hand. But you can't mechanically compare the YB results with the PG results because differences in the storage mechanisms bring, with no ordering, different orderings in the two envs. Therefore, when the tests sample just the first few, the middle few, and the last few rows, all the values that you see are different. Even so, it's easy to conclude, by manual inspection:

that these two defining subqueries behave the same as each other:
```
select ... from k__v                 order by k
select ... from k__v
```
and do not show the second kind of wrong results.
and that these two defining subqueries behave the same as each other:
```
select ... from h_k__v where h = 1 order by k
select ... from h_k__v where h = 1
```
and do show the second kind of wrong results.

Without tracing what's going on in the underlying implementation, these results seem to be bizarrely random. It's impossible to tell, therefore, if yet more tests would reveal yet more errors.

However, the high-level conclusion is clear. Until the root causes are identified and fixed:

cursor functionality is safe only when the operations are limited to those that do not cause the "0A000: \<operation> not supported yet" warning.

And even this conclusion is subject to the caveat that YB is vulnerable to DML from other sessions in the race condition window between declaring a cursor and first fetching from it—while PG is not vulnerable in this way.

Testcase design

To run the testcase (for the case that no concurrent table changes are made) simply define the psql variables db and u:

\set db  < a convenient scratch database >
\set u   < a user that has "create" on the database you chose >

and invoke 0.sql at the psql or ysqlsh prompt. You'll have to read this whole comment section to learn how to run the testcase in all of its variants.

issue-6514-2.zip contains these files:

0.sql
1-cr-tables.sql
2-cr-proc-open-cursor.sql
3-cr-fn-rows-from-cursor.sql
4-cr-rule-off.sql
5-run-test.sql
6-run-all-tests.sql

1-cr-tables.sql

This has the two create table statements for the tables k__v and _hk__v shown above and these create procedure statements to populate and to update them:

create procedure s.populate_tab_k__v(no_of_nows in int)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  truncate table s.k__v;

  with s(v) as (select generate_series(1, no_of_nows))
  insert into s.k__v(k, v)
  select v, (no_of_nows*10 + v) from s;
end;
$body$;

create procedure s.populate_tab_h_k__v(no_of_nows in int)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  -- Symmetrical with the code for table "s.k__v".
  ...
  insert into s.h_k__v(h, k, v)
  select 1, v, (no_of_nows*10 + v) from s;
end;
$body$;

create procedure s.update_tables(do_it in boolean)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  if do_it then
    update s.k__v   set v = v + 90;
    update s.h_k__v set v = v + 90;
  end if;
end;
$body$;

2-cr-proc-open-cursor.sql

This creates a procedure to declare the cursor cur in nine different ways:

create procedure s.open_cursor(mode in int)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  prelude  constant text not null := 'declare cur scroll cursor without hold for select ';
  s_list   constant text not null := 's.fmt(k)||s.fmt(v)';
begin
  begin
    execute 'close cur';
  exception when invalid_cursor_name then null;
  end;

  case mode
    ...
    when 4 then execute
      prelude||s_list||
        ' from s.h_k__v where h = 1 order by k';
    ...
    when 9 then execute
      replace(prelude, 'without', 'with')||s_list||
        ' from s.h_k__v where h = 1 order by k';
  end case;
end;
$body$;

It also creates the trivial fmt() helper function:

create function s.fmt(i in int)
  returns text
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  return to_char(i, '9999999');
end;
$body$;

And it creates a trivial function to display the columns _isholdable and statement for the cursor cur from the _pgcursors catalog view:

create function s.defining_subquery()
  returns text
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  with_hold  boolean not null := false;
  stmt       text    not null := '';
begin
  select is_holdable, statement
  into strict with_hold, stmt
  from pg_cursors where name = 'cur';

  -- Improve the readability.
  declare
    holdable constant text not null := case with_hold
                                         ...
                                       end;
    subq  constant text not null := replace(...stmt...);
  begin
    return holdable||subq;
  end;
end;
$body$;

This is used to annotate the spooled testcase output. Here is its output for the seven modes that the mechanical test uses. (Whitespace has been added by hand.)

mode
----  
   2    not holdable     select ... from k__v                 order by k
   4    not holdable     select ... from h_k__v  where h = 1  order by k
   5    not holdable     select ... from h_k__v  where h = 1  order by v
   6    not holdable     select ... from h_k__v               order by k
   7    not holdable     select ... from h_k__v               order by v

   8    holdable         select ... from k__v                 order by k
   9    holdable         select ... from h_k__v where  h = 1  order by k

3-cr-fn-rows-from-cursor.sql

This creates the table function _s.rows_fromcursor(). It performs these operations:

Forwards and backwards at the start
------------------------------------
fetch first          from cur
fetch next           from cur
fetch prior          from cur

-- Check the count.
------------------------------------
move absolute 0      in   cur
move forward all     in   cur
get diagnostics n = row_count
assert n = cnt

Backwards and forwards at the end
------------------------------------
fetch prior          from cur
fetch prior          from cur
fetch next           from cur

Forwards and backwards in the middle
------------------------------------
move absolute 0      in   cur
move relative cnt/2  in   cur
fetch relative 0     from cur
fetch next           from cur
fetch prior          from cur

Backwards and forwards in the middle
------------------------------------
fetch relative 0     from cur
fetch prior          from cur
fetch next           from cur

Forwards and backwards at the start
------------------------------------
fetch first          from cur
fetch next           from cur
fetch prior          from cur

The return value after each is compared with the expected value. The expected value is defined, tautologically, as what the test produces in vanilla PG. There, the set of results that the schedule of fetches produces is the same in all _thirtyfive cases (each of the seven cursor declarations that the mechanical test uses combined with each of the five concurrent update variants). Sometimes, in YB, the fetch finds no row—because of the bugginess that the testcase reveals. This is detected as a null return value. And this is transformed, using coalesce(), into \<no row> for readability. Only when the return value differs from the expected value does the table function return it. This makes the buggy cases very easy to spot.

4-cr-rule-off.sql

This creates the trivial function _s.ruleoff() to add captions like this to the spooled output. Here's an example that shows a buggy outcome:

 ================================================================================
 not holdable: select ... from s.h_k__v where h = 1 order by k
 ================================================================================

 --- First attempt ------------------------------------------

 Forwards and backwards at the start
        3  100003

 Backwards and forwards at the end
    <no row>
    <no row>
    <no row>

 Forwards and backwards in the middle
     5002  105002
     5003  105003
     5004  105004

 Backwards and forwards in the middle
     5006  105006
     5007  105007
     5008  105008

 Forwards and backwards at the start
        3  100003

 --- Second attempt -----------------------------------------

 Forwards and backwards at the start
        3  100003
 ...

5-run-test.sql

Here it is:

\c :db :u
set client_min_messages = error;
call s.populate_tab_k__v(10000);
call s.populate_tab_h_k__v(10000);

\if :without_hold
  start transaction;
    call s.open_cursor(:open_cursor_mode);
    select s.rule_off(s.defining_subquery());

                                                  /*
                                                    In this session OR in concurrent session.
                                                    \c :db :u
                                                  */;
                                                  call s.update_tables(:update_before_first_cursor_use);
    select s.rule_off('First attempt', 2);
    select s.rows_from_cursor();
    select s.rule_off('Second attempt', 2);
    select s.rows_from_cursor();
                                                  /*
                                                    In this session OR in concurrent session.
                                                    \c :db :u
                                                  */;
                                                  call s.update_tables(:update_after_first_cursor_use);
    select s.rule_off('Third attempt', 2);
    select s.rows_from_cursor();
  rollback;
\else
  start transaction;
    call s.open_cursor(9);
    select s.rule_off(s.defining_subquery());
                                                /*
                                                  In this session OR in concurrent session:
                                                  \c :db :u
                                                */;
                                                call s.update_tables(:update_before_first_cursor_use);
commit;
  select s.rule_off('First attempt', 2);
  select s.rows_from_cursor();
                                                /*
                                                  In this session OR in concurrent session:
                                                  \c :db :u
                                                  */;
                                                call s.update_tables(:update_after_first_cursor_use);
  select s.rule_off('Second attempt', 2);
  select s.rows_from_cursor();
\endif

Notice the psql variables whose values are accessed using _:withouthold, _:open_cursormode, _:update_before_first_cursoruse, and _:update_after_first_cursoruse. Three of the concurrent update scenarios can be run mechanically. In these cases for concurrent updates:

none
done from the present session after the _opencursor() invocation and before the first _s.rows_fromcursor() invocation
done from the present session after the second or first _s.rows_fromcursor() invocation

the testing can be done simply by invoking 5-run-test.sql from the driver script 6-run-all-tests.sql (below) and by setting _update_before_first_cursoruse and _update_after_first_cursoruse before each run.

But the remaining two cases for concurrent updates:

done from a second session after the _opencursor() invocation and before the first _s.rows_fromcursor() invocation
done from a second session after the second or first _s.rows_fromcursor() invocation

must be stepped through manually by setting the psql variable _open_cursormode by hand, switching to the second session at the right moment to issue _call s.updatetables(true) and then switching back to the first session to finish the sequence. Notice that you must choose manually which leg of the _\if :withouthold meta-command test to run to match the value that you set for _open_cursormode. Values 2, 4, 5, 6, and 7 call for setting _withouthold to true. And values 8 and 9 call for setting it to false.

6-run-all-tests.sql

Here it is:

\c :db :u
set client_min_messages = error;

-- Set these in turn to "false, false", "true, false", or "false, true".
\set update_before_first_cursor_use  false
\set update_after_first_cursor_use   false

select (version() like '%YB%')::text as is_yb
\gset

\t on
\if :is_yb
  \o output/yb.txt
\else
  \o output/pg.txt
\endif

\set without_hold true
\set open_cursor_mode 2
\ir 5-run-test.sql

\set open_cursor_mode 4
\ir 5-run-test.sql

\set open_cursor_mode 5
\ir 5-run-test.sql

\set open_cursor_mode 6
\ir 5-run-test.sql

\set open_cursor_mode 7
\ir 5-run-test.sql

\set without_hold false
\set open_cursor_mode 8
\ir 5-run-test.sql

\set open_cursor_mode 9
\ir 5-run-test.sql

\o
\t off

Results

Here they are:

Cursor's subquery	Holdable	No updates	Updates in same session before first fetch	Updates in same session after first fetch	Updates in other session before first fetch	Updates in other session after first fetch
select … from k__v order by k	No	OK	Buggy	OK	Buggy	OK
select … from h_k__v where h = 1 order by k	No	Buggy	Double buggy	Buggy	Double buggy	Double buggy
select … from h_k__v where h = 1 order by v	No	OK	Buggy	OK	Buggy	OK
select … from h_k__v order by k	No	OK	Buggy	OK	Buggy	OK
select … from h_k__v order by v	No	OK	Buggy	OK	Buggy	OK
select … from k__v order by k	Yes	OK	Buggy	OK	Buggy	OK
select … from h_k__v where h = 1 order by k	Yes	OK	Buggy	OK	Buggy	OK

For all tests except for this:

mode
----  
   4    not holdable     select ... from h_k__v  where h = 1  order by k

"Buggy" means that the cursor sees the effect of updates that are made to the table it queries when these are made, either in the same session or in a second session, between the invocation of _opencursor() and the first invocation of _rows_fromcursor().

And for the tests for this exceptional holdability and subquery, plain "Buggy" means that any fetch after scrolling backwards sees the wrong row or no row at all; and "Double buggy" means what plain "Buggy" means together with seeing the effect of changes made to the underlying table during the cursor's lifetime. Notice the here, and only here, this departure from the proper behavior (MVCC as of the moment that the cursor is opened) occurs in one extra case: when the updates are done in a second session even after having already invoked _rows_fromcursor() a few times.

Manual tests

It's useful to look at move at fetch outcomes for the problem cursor definition. Before starting, make sure that you're not in an ongoing transaction:

rollback;

(If _client_minmessages happens to be set to error, then rollback will always finish silently, whether or not you're not in an ongoing transaction, and leave you with no ongoing transaction.) Now set up like this. (Only basic, top-level, SQL statements, with the exception of one anonymous block, are used here in this section.)

truncate table s.h_k__v;
with s(v) as (select generate_series(1, 100))
  insert into s.h_k__v(h, k, v)
  select 1, v, (1000 + v) from s;

start transaction;
declare cur scroll cursor without hold for
  select k, v from s.h_k__v where h = 1 order by k;

First, make sure that none of these statements draws a warning and produces correct results:

set client_min_messages = warning;
fetch             from cur;
fetch next        from cur;
fetch forward     from cur;
fetch 3           from cur;
fetch forward 3   from cur;

Here are its results to date:

All other fetch variants, and all of the move variants draw the 42601 warning, for example:

fetch relative 0  from cur;

Moreover, it gets the wrong answer, thus:

 11 | 1011

And simply repeating it (which you might think would get the same wrong answer), gets a new wrong answer:

 13 | 100013

Now suppress the warnings and try some other examples:

rollback;
set client_min_messages = error;
start transaction;
declare cur scroll cursor without hold for
  select k, v from s.h_k__v where h = 1 order by k;

fetch first          from cur;
fetch next           from cur;
fetch prior          from cur;

Here are the three results, in order:

 1 | 1001
 2 | 1002
 3 | 1003

The first two results are correct; but the third is wrong.

This means that the fact that fetch prior draws just warning is (arguably) itself a bug. It should (arguably) cause a straight error—and produce no result;

You can repeat this sequence of three fetch statements, still in the same ongoing transaction, as many times as you want. The results are always the same. This seems to suggest that fetch first always gets the right result and that it's therefore a bug that it draws the warning:

0A000: FETCH FIRST not supported yet

Now use the popular technique that, in PG, famously gets you the correct count and correct rows, at the moment that the cursor is declared, even in the presence of heavy concurrent changes to the table from other concurrent sessions:

do $body$
declare
  cur constant refcursor not null := 'cur';
  n int not null := 0;
begin
  move absolute 0      in   cur;
  move forward all     in   cur;

  get diagnostics n = row_count;
  assert n = 100, 'Bad count';
end;
$body$;

It finishes silently, time and again. This seems to suggest that move absolute 0 always gets the right result and that it's therefore a bug that it draws the (wrongly worded) warning:

0A000: FETCH ABSOLUTE not supported yet

It's then followed by this:

0A000: MOVE not supported yet

Now look at the effect of concurrent table changes from another session:

rollback;
set client_min_messages = error;

drop function if exists s.f(int) cascade;
create function s.f(i in int)
  returns text
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  select i::text;
$body$;

truncate table s.h_k__v;
with s(v) as (select generate_series(1, 100))
  insert into s.h_k__v(h, k, v)
  select 1, v, (1000 + v) from s;

\t on
start transaction;
declare cur scroll cursor without hold for
  select s.f(k), v
  from s.h_k__v
  where h = 1
  order by k
;

fetch first  from cur;
fetch next   from cur;
fetch prior  from cur;

/*
DO BY HAND
                            /* In second session*/ update s.h_k__v set v = v + 90;
fetch first  from cur;
*/;

It produces these results. First:

 1 | 1001
 2 | 1002
 3 | 1003

And then, after updating the table in the second session:

 1 | 1091

So this test has shown the unique "double-buggy" outcome with just a few top-level fetch statements and without the possible complication of invoking the move and fetch operations using PL/pgSQL.

Notice, however, the manifestation of the final wrong value for v, 1091 instead of 1001, is sensitive to what seem to be irrelevant changes. It goes away if you make any one of these changes, by itself, with respect to the code as presented:

Replace "select s.f(k), v" with what the function does: "select k::text, v".
Simply remove _"set search_path = pg_catalog, pgtemp" from the definition of s.f(). (This seems to be beyond belief.)
Remove the restriction "where h = 1".
Replace "order by k" with "order by v" (or remove the order by clause altogether).

Notice that you can remove "order by k" altogether. Because of the table's definition and how YB works under the covers, this doesn't change the ordering of the results. This change preserves the buggy behavior!

Notice that all of these tests produce only correct results using vanilla PG.

Discussion

Cursor functionality in YB, as of this comment's timestamp, is seen to be massively buggy when you suppress warnings and test the unsupported operations. But when you avoid these, you lose almost all of the functionality that makes the feature interesting. The PG community regards the functionality as very useful. For example, PG expert Laurenz Albe has published these two recent blog posts on the topic:

WITH HOLD cursors and transactions in PostgreSQL — 20-Jan-2021
Pagination and the problem of the total result count — 10-Jan-2023

Both of these posts use operations that draw the "unsupported" warning in YB and that, in certain tests, produce wrong results. He also uses the functionality in code that he's published for migrating data from foreign sources into PG. See HERE.

The fact that my tests show many cases of bizarre, but reproducible, wrong behavior makes me suspect that the must be some gross bugs in YB's implementation—so gross that the causes ought to be identifiable by single-stepping through the internal implementation for these tests.

I have a hunch that the reversibility property for the candidate plans for the subquery that defines a cursor might be one of the culprits. And maybe the code, adopted from PG, that intends to determine the reversibility property for a plan hasn't been properly modified to accommodate the fact that YB's storage model is so very different from PG's. This hunch is strengthened by the fact that the result set is always cached for a with hold cursor and that, here, the random navigation that you do with move and positional fetch never gets wrong results. (The fact that the MVCC moment is advanced in YB, w.r.t. PG, from the moment that declare is executed seems to be a separable problem.)

It would be interesting to intervene in the reversibility test so that it always says "not reversible" and then to try all my testcases.

One further test

Look at the one row in the table of results above that has cells that say "Double buggy". This is the full select that defines the cursor for that row:

select s.fmt(k)||s.fmt(v)
from from s.h_k__v
where h = 1 order by k

I added a new version of the procedure _s.opencursor() to examine this more closely. I put it in this file:

investigating-effect-of-selecting-row-number/2s-cr-proc-open-cursor.sql

This is its content:

drop procedure if exists s.open_cursor(int) cascade;

create procedure s.open_cursor(mode in int)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  prelude   constant text not null := 'declare cur scroll cursor without hold for select ';
  s_list_1  constant text not null := 's.fmt(0)||                         s.fmt(k)||s.fmt(v)';
  s_list_2  constant text not null := 's.fmt((row_number() over())::int)||s.fmt(k)||s.fmt(v)';
begin
  begin
    execute 'close cur';
  exception when invalid_cursor_name then null;
  end;

  case mode

    when 4 then execute
      prelude||s_list_1||' from s.h_k__v where h = 1 order by k';

    when 99 then execute
      prelude||s_list_2||' from s.h_k__v where h = 1 order by k';

  end case;
end;
$body$;

These, then, are the defining select statements for the two modes:

-- mode => 4

select s.fmt(0)||s.fmt(k)||s.fmt(v)
from from s.h_k__v
where h = 1 order by k

and:

-- mode => 99

select s.fmt((row_number() over())::int)||s.fmt(k)||s.fmt(v)
from from s.h_k__v
where h = 1 order by k

I added a corresponding new version of the script to run the tests to focus on just the two modes 4 and 99 in this file:

investigating-effect-of-selecting-row-number/6s-run-all-tests.sql

This is its content:

\set db d0
\set u d0$u0
:c
\c :db :u
set client_min_messages = error;
\set VERBOSITY verbose
set search_path = pg_catalog, pg_temp;
--------------------------------------------------------------------------------
\ir 2s-cr-proc-open-cursor.sql
\set update_before_first_cursor_use  false
\set update_after_first_cursor_use   false

select (version() like '%YB%')::text as is_yb
\gset

\t on
\if :is_yb
  \o output/yb.txt
\else
  \o output/pg.txt
\endif

\set without_hold true
\set open_cursor_mode 4
\ir ../5-run-test.sql

\set open_cursor_mode 99
\ir ../5-run-test.sql

\o
\t off

I didn't bother with the other combinations for _update_before_first_cursoruse and _update_after_first_cursoruse.

Simply cd to the investigating-effect-of-selecting-row-number directory (on the directory where you unzipped issue-6514-2.zip) and create the two files 2s-cr-proc-open-cursor.sql and 6s-run-all-tests.sql there using the content shown above. Then create a directory output as a peer to these two scripts and start 6s-run-all-tests.sql first using PG and then using YB. This is what I saw:

For mode => 4, the YB output differs from the PG output just as it does for the full testcase that issue-6514-2.zip brings.
For mode => 4, the YB output and the PG output are identical — i.e. the bug has vanished.

This observation strengthens the theory that the YB bugginess stems from this:

In some cases, the plan for a particular select is not reversible in YB where it is in PG. And yet YB treats it as reversible and therefore gets the wrong results.

Maybe the plan could be executed in the backwards direction, but this is simply implemented wrong. Or maybe it can't be reversed, and so the code that detects reversibility needs to detect such cases.

As I reason it, the plan for a select that has _rownumber() in the select list item can't be reversed because the successive values for _rownumber() can only emerge as the rows are produced one-by-one. And when the plan starts to produce rows, it has no idea how many there'll be. So it can't count downwards from the total number of rows.

I'm guessing that we still use the code that we inherited from PG that detects when _rownumber() over() is present and returns "non-reversible". Because this causes the result set (in the forwards direction) to be cached, the net effect is bound to be correct for backwards operations.

issue-6514-3.zip — Bllewell summary

This account describes the overall design of some tests that explore if/when the result set that a cursor defines is cached and how the possible caching affects the actual values that move and fetch produce when volatile functions are used in the subquery that defines that cursor.

It also draws some conclusions from the results.

The attached issue-6514-3.zip contains these three .sql scripts:

0-result-set-caching-tescase.sql
1-set-up.sql
2-fetch-and-move.sql

The 0-*.sql script is the master script. It invokes the 1-*.sql and 2-*.sql scripts. The scripts produce both raise info output—from the functions s.f1(), s.f2(), and s.series()—and ordinary .sql output. Commands like this, at the O/S prompt (connecting both to YB and to PG) are used to produce spool files that interleave the raise info output with the SQL output:

psql -h u -p 5433 -d ... -U ... -f 0-result-set-caching-tescase.sql &> yb.txt

Reference copies of the two spool files, yb-0.txt and pg-0.txt, are also included in issue-6514-3.zip.

Overall design

These tests use three helper functions, thus:

s.f1() (volatile) — This uses a user-defined run-time parameter my.var to hold (the text typecast of) an int value as its state. Each successive invocation increments the state by 1 and returns that new int value. This function is sometimes used in the select list of the subquery that defines a cursor.
s.f2() (volatile) — This is a variant of s.f1(). It holds and increments state in the same way. But it uses a case expression to compute the return value thus:
```
return
case
  when v between  1 and  5 then 1
  when v between  6 and 10 then 2
  when v between 11 and 15 then 3
  when v between 16 and 20 then 4
  when v between 21 and 25 then 5
  else                          42
end;
```
This function is sometimes used in the where clause of the subquery that defines a cursor.
s.series() (immutable) — This is simply a wrapper for _generateseries(1, 5).

Each of these functions uses raise info to report when it's invoked.

The test declares a cursor in various different ways to explore these degrees of freedom:

Without hold or with hold.
The from list for the cursor's defining subquery uses:
- either a table _s.t(k serial PK, v int)_that was populated by invoking s.series() to set v
- or s.series() directly.
Including s.f1() in the select list or not.
Including the restriction where v = s.f2() or not.
Including _rownumber() over(order by v) in the select list or not.

These five binary degrees of freedom bring 32 different combinations. But, here, only six of these are included in the test to keep it manageably small. This is enough to show the general pattern:

Sometimes, the cursor's result set is cached, and sometimes it isn't.
When the result set is cached, this is sometimes done incrementally when each as yet unseen result is first encountered—and it's sometimes done in its entirety at the first opportunity.
When the result set is cached, sometimes this is done explicitly for the ultimate results but sometimes intermediate results, without the use of f1() in the select list or, orthogonally, with or without the restriction where v = s.f2(). In the case of either of these evaluation omission, the function is applied at fetch or move time to the intermediate results when they are _fetched from the cache.

How to run the testcase

First step through the testcase by hand, observing how the raise info output interleaves with the SQL output. It's enough to do this using just YB.

Then do this, at the O/S prompt, to get both the raise info output and the SQL output into the same spool file:

# YB
psql -h u -p 5433 -d d0 -U d0\$u0 -f 0-result-set-caching-tescase.sql &> yb.txt

# Vanilla PG
psql -h u -p 5432 -d d0 -U d0\$u0 -f 0-result-set-caching-tescase.sql &> pg.txt

And then diff yb.txt and pg.txt. You'll see that they are identical. The attached issue-6514-3.zip contains reference copies of these as yb-0.txt and pg-0.txt.

1-set-up.sql

Read the script to see the detail. This is the outline.

drop schema if exists s cascade;
create schema s;

create function s.caption(t in text, main in boolean = false)
  returns table(z text)
  ...
  language plpgsql
as $body$
  ...
$body$;

create function s.f1()
  returns int
  ...
  language plpgsql
as $body$
  ...
$body$;

create function s.f2()
  returns int
  ...
  language plpgsql
as $body$
  ...
$body$;

create function s.series()
  returns table(v int)
  ...
  language plpgsql
as $body$
  ...
$body$;

create table s.t(k serial primary key, v int not null);
select s.caption('Populating table s.t,', true);
insert into s.t(v) select v from s.series();

/*
  Sanity test. In general, you can't rely on the order of evaluation
  of the select list items. But the results of these little tests seem
  to be repeatable.

  set my.var = 0;
  select s.f1() as f1, s.f1() as f2, s.f1() as f3, s.f1() as f4, s.f1() as f5;

  set my.var = 0;
  select s.f2() as f2, s.f2() as f2, s.f2() as f3, s.f2() as f4, s.f2() as f5;
*/;

The s.f1() test gets this result on the first invocation of the select:

  1 |  2 |  3 |  4 |  5

And you see "INFO: f1() invoked" five times. Successive repeats of the same select return ever-increasing values from f1() so that repeating the select for the second time produces this:

  6 |  7 |  8 |  9 | 10

The s.f2() test gets this result on the first invocation of the select. And you see "INFO: f1() invoked" five times.

  1 |  1 |  1 |  1 |  1

The first repeat of the same select returns this:

  2 |  2 |  2 |  2 |  2

And so it goes on. The between ranges in f2() were chosen so that it will return the same value for each row when it's called thus:

select s.f2() from s.series(5);

2-fetch-and-move.sql

The script precedes each substantive fetch or move statement with s.caption() to help the readability of the spooled output, thus:

select s.caption('fetch first');         fetch first        from cur;
select s.caption('fetch first');         fetch first        from cur;
select s.caption('move  relative 3');    move  relative 3   from cur;
select s.caption('fetch forward all');   fetch forward all  from cur;
select s.caption('fetch prior');         fetch prior        from cur;
select s.caption('fetch backward all');  fetch backward all from cur;
select s.caption('fetch first');         fetch first        from cur;

The six actual tests

Notice that the very same \ir 2-fetch-and-move.sql is used in all tests.

Test 1

set my.var = 0;
start transaction;
  select s.caption('declare');
  declare cur scroll cursor without hold for
    select s.f1(), v from s.t order by v;
  \ir 2-fetch-and-move.sql;
rollback;

These are the SQL results:

Look at one of the two identical spool files to see how the raise info output interleaves with the SQL output. It always comes before the SQL output.

When the fetch gets a single row, you see a single INFO: f1() invoked output for each such row row.
When move moves over several result rows, you see INFO: f1() invoked outputs for each moved-over row.
When the fetch gets several rows, you see exactly as many INFO: f1() invoked outputs as rows that are fetched.

The outcomes are consistent with an implementation where the results are delivered simply by executing the plan explicitly in the forward or backward direction as requested, including therefore evaluating s.f1(), for every single fetch or move execution.

NOTE: These observations are also consistent with an implementation that caches the raw result set from scanning the table and that invokes f1() whenever the cache is accessed. The test cannot distinguish between the two alternative hypotheses.

Test 2

This is identical to Test 1 except for the difference that this:

set my.var = 0;
start transaction;
  select s.caption('declare');
  declare cur scroll cursor with hold for
    select s.f1(), v from s.t order by v;
select s.caption('commit');
commit;

\ir 2-fetch-and-move.sql;
close cur;

In other words, without hold is replaced by with hold and the explicitly started transaction that declares the cursor is immediately committed. Then each fetch is done in ordinary autocommit mode. And then the cursor is explicitly closed with close cur.

Here, we see INFO: f1() invoked exactly five times immediately following commit.

And here, we don't see any more raise info output. These are the SQL results:

The values of v are the same as in Test 1. But the values reported for f1() are different. Now they are neatly paired with the values for v.

These results are consistent with an implementation where the entire result set is cached, when the transaction that creates the with hold cursor issues commit. Then all cursor accesses using fetch or move access the cached result set. Notice that (in contrast to what Test 3 shows) the result of invoking f1() in the select list is also cached.

Test 3

set my.var = 0;
start transaction;
  select s.caption('declare');
  declare cur scroll cursor without hold for
    select s.f1(), v from s.series();
  \ir 2-fetch-and-move.sql;
rollback;

In other words, it's is identical to Test 1 except for the single difference that this:

from s.t order by v

is replaced with this:

from s.series(5)

The thinking is that, because the immutable user-defined function s.series() is opaque, the execution plan cannot be run backwards—and so at least the raw results must be cached. (The order by is dispensed with in Test 2 because the results from s.series() are guaranteed to be produced in the order that they are computed.) The fact that s.series() starts with raise info 'series() invoked' allows this hypothesis to be tested.

The SQL output for Test 3 is identical to that for Test 1 (as it must be for correctness). But the raise info output differs in just one way. Test 3's output starts with "INFO: series() invoked" while Test 1's output, of course, does not. This message is never seen again. (The "INFO: f1() invoked" output is identical in both tests.)

This tells us that, at least in this test, the raw results from executing s.series() are cached while the function f1) is invoked when a result is fetched from the cache.

Test 4

This is identical to Test 1 except that it extends the select list with _rownumber(). The idea is that the plan for such a query cannot be reversible.

set my.var = 0;
start transaction;
  select s.caption('declare');
  declare cur scroll cursor without hold for
    select row_number() over(order by v) as pos, s.f1(), v from s.t order by v;

  \ir 2-fetch-and-move.sql;
rollback;

These are the SQL results:

These results are, for the select list items s.f1() and v, identical to those for Test 2. And the results for _rownumber() line up with those for v, just as the order by demands. When you look at the spool file, you'll see that the INFO: f1() invoked output is seen only when a row that has not so far been fetched or moved over is fetched or moved over for the first time. These observations are consistent with the hypothesis that a row is added to the result cache on demand when it is first produced. Moreover, the cached row value includes the effect of the select list function so that it is not re-invoked when the cache is accessed.

Test 5

This is the same as _Test 1 except that it removes s.f1() from the select list and adds the restriction where v = s.f2().

set my.var = 0;
start transaction;
  select s.caption('declare');
  declare cur scroll cursor without hold for
    select v from s.t where v = s.f2() order by v;

  \ir 2-fetch-and-move.sql;
rollback;

These are the SQL results:

When you look at the spool file, you'll see that the INFO: f2() invoked output is seen five times before the very first fetch first result is produced. Recall that f2() returns the value 1 for each of its first five invocations—so, for one point-of-view, just a single invocation of f2() is sufficient to produce the first result. The fact that f2() is invoked five times before the first result is seen is consistent with the hypothesis that analysis of the select statement or of its plan produced the decision that the entire already restricted result set should be cached when the very first fetch or move is done. Of course, then, the result set has just the single row for v = 1. Any subsequent fetch will either get that row—or get no row at all if the current position is that of the single result.

Test 6

This is the same as Test 5 except that it uses from s.series() without order by in stead of accessing the table s.t using an order by.

start transaction;
  select s.caption('declare');
  declare cur scroll cursor without hold for
    select v from s.series() where v = s.f2();

  \ir 2-fetch-and-move.sql;
rollback;

These are the SQL results:

Notably, they're different from those from Test 5 because of caching differences.

When you look at the spool file, you'll see that first INFO: series() invoked is seen (and never seen again). Then the INFO: f2() invoked output is seen a single time and then the first SQL result is seen. Then the INFO: f2() invoked output is seen a single time again and then the second SQL result is seen. The second SQL result happens to be the same as the first SQL result because s.f2() produces the same result on its second invocation as on its first invocation. And so it goes on.

The critical point here is that, though the row source delivers the same values for v in Test 6 and Test 5, the final SQL results are different. These results are consistent with the hypothesis that analysis of the select statement or of its plan produced the decision that the raw row source results should be cached when the very first fetch or move is done and that, thereafter, every fetch or move applies the where v = s.f2() restriction to the results that are retrieved from the cache. The deliberately contrived pattern of f2()'s volatile production of return values explains the rest of what the spool file shows.

Conclusion

Because f1() and f2() were designed deliberately to be volatile, the different ways that caching is done lead to genuine semantic differences in the results. The PG doc (in the Cursors subsection of the PL/pgSQL chapter HERE) has this caution:

use NO SCROLL with a query that involves volatile functions. The implementation of SCROLL assumes that re-reading the query's output will give consistent results, which a volatile function might not do.

This caveat, together with the observed results, lead to this conclusion:

The user has no reliable way to predict, in general, when and how result-set caching will be done. For example: the raw values from the row source might be cached; or the results after the use of select list or where clause functions might be cached. The results from a scrollable cursor whose defining subquery involves a volatile function must therefore be regarded as unpredictable and, because of that, unreliable.

This, in turn, suggests that YB need not try to mimic PG's whimsical caching rules. On the other hand, if this outcome always, or even just sometimes, emerges from the direct re-use of the PG code, then this cannot be considered to be harmful.

Yet more anomalies: functionality that causes the 0A000 "not supported" error in top-level SQL works fine in PL/pgSQL

Try this in top-level SQL just as revision:

drop schema if exists s cascade;
create schema s;
create table s.t(k serial primary key, v int not null);
insert into s.t(v) select generate_series(0, 99, 5);

start transaction;
  declare cur cursor without hold for
    select t.k, t.v
    from s.t
    where t.k between 6 and 11
    order by t.k;
  fetch first from cur;
rollback;

It causes the 0A000 error:

FETCH FIRST not supported yet

Replace fetch first from cur; with fetch last from cur; — the outcome is the same error but now with this message:

FETCH LAST not supported yet

This is the hint for both:

See https://github.com/yugabyte/yugabyte-db/issues/6514. React with thumbs up to raise its priority

Now try the same thing using a PL/pgSQL function:

create function s.f(k_lo in int, k_hi in int)
  returns table(k int, v int)
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  cur refcursor not null := 'cur';
begin
  open cur for (
    select t.k, t.v
    from s.t
    where t.k between k_lo and k_hi
    order by t.k);

  fetch first from cur into k, v;      return next;
  fetch last  from cur into k, v;      return next;
  close cur;
end;
$body$;

select k, v from s.f(6, 11);

It all works without error and produces this result:

 k  | v  
----+----
  6 | 25
 11 | 50

This inconsistency in behavior is beyond belief.

yugabyte / yugabyte-db