pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.66k stars 249 forks source link

Provide example of custom restriction selectivity estimator function implementation for custom operators #1344

Open xwkuang5 opened 1 year ago

xwkuang5 commented 1 year ago

Custom operators can provide a restriction selectivity estimator function via the ::pgrx::pgrx_macros::restrict macro (postgres doc).

After browsing the pgrx code base, I only found example use of ::pgrx::pgrx_macros::restrict with system-provided selectivity estimate function like eqsel, scalarltsel. It would be great if we can provide an example of custom implementation of this function under the pgrx-examples directory.

For context, I am implementing a custom value type (called fsvalue) to model Cloud Firestore's data type. Repo: https://github.com/xwkuang5/pgfirestore. I'd like to experiment with adding selectivity estimator for the different operators (<, >, etc).

Thanks!

eeeebbbbrrrr commented 1 year ago

It's probably best to lean on you or other contributors to provide this. I'm sure one of us could come up with a contrived example, but it's not clear to me it would be applicable to the real world.

xwkuang5 commented 1 year ago

Thanks for the prompt reply!

I can certainly help with that if you can provide some code pointers for adjacent area that I should look at. Right now I'm somewhat confused about what pgrx type the PG_FUNCTION_ARGS parameter, which the estimator function is supposed to take, should map to

eeeebbbbrrrr commented 1 year ago

I assume you're looking at a C example? What is that? I can probably illustrate how to port it to rust/pgrx.

xwkuang5 commented 1 year ago

yes, I'm looking at the postgres source code implementation for scalarltsel

/*
 *      scalarltsel     - Selectivity of "<" for scalars.
 */
Datum
scalarltsel(PG_FUNCTION_ARGS)
{
    return scalarineqsel_wrapper(fcinfo, false, false);
}

PG_FUNCTION_ARGS seems to just expand to FunctionCallInfo fcinfo

#define PG_FUNCTION_ARGS   FunctionCallInfo fcinfo

Link: https://doxygen.postgresql.org/selfuncs_8c_source.html#l01470

eeeebbbbrrrr commented 1 year ago

hmm. Yeah, okay, that one is a bit tricky. You'd want something like this:

unsafe fn extern "C" scalarltsel(fcinfo: pg_sys::FunctionCallInfo) -> Datum {
   use pgrx::fcinfo::pg_getarg;
   let root = pg_getarg::<pg_sys::Datum>(fcinfo, 0).cast_mut_ptr::<pg_sys::PlannerInfo>();
   let operator = pg_getarg::<pg_sys::Oid>(fcinfo, 1);
   let args = pg_getarg::<pg_sys::Datum>(fcinfo, 2).cast_mut_ptr::<pg_sys::List>();
   let var_relid = pg_getarg::<i32>(fcinfo, 3);
  ...
}

This is untested, of course, but I think it starts to point you in the right direction?

I was taking that from src/backend/utils/adt/selfuncs.c's scalarineqsel_wrapper() function:

static Datum
scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
{
    PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
    Oid         operator = PG_GETARG_OID(1);
    List       *args = (List *) PG_GETARG_POINTER(2);
    int         varRelid = PG_GETARG_INT32(3);
    ...
xwkuang5 commented 1 year ago

Thanks for the pointer! I will try this out

xwkuang5 commented 1 year ago

I was able to influence the optimizer's estimate of the number of output tuples via the following stub:

#[pg_operator(immutable, parallel_safe)]
#[restrict(fs_scalarltsel)]
#[opname(#<)]
fn fs_lt(lhs: FsValue, rhs: FsValue) -> bool {
    not_null_nor_nan(&rhs) && is_same_type(&lhs, &rhs) && lhs.lt(&rhs)
}

#[pg_extern(sql = r#"
CREATE OR REPLACE FUNCTION fs_scalarltsel(internal, oid, internal, integer) RETURNS float8
STRICT
LANGUAGE c /* Rust */
AS '@MODULE_PATHNAME@', '@FUNCTION_NAME@';
"#)]
unsafe fn fs_scalarltsel(fcinfo: pg_sys::FunctionCallInfo) -> f64 {
    use pgrx::fcinfo::pg_getarg;
    let root = pg_getarg::<pg_sys::Datum>(fcinfo, 0).unwrap().cast_mut_ptr::<pg_sys::PlannerInfo>();
    let operator = pg_getarg::<pg_sys::Oid>(fcinfo, 1).unwrap();
    let args = pg_getarg::<pg_sys::Datum>(fcinfo, 2).unwrap().cast_mut_ptr::<pg_sys::List>();
    let var_relid = pg_getarg::<i32>(fcinfo, 3).unwrap();
    ::pgrx::info!("operator oid: {:?}", operator);
    // Vary this from 0.0 to 1.0 to influence the estimates
    0.0
}

The thing that was missing was a custom sql script to inform postgres about such an estimator function. While the following examples show this working end-to-end, it's difficult at this point to work with the input parameters (PlannerInfo, List, etc) given a lack of an easier to use API around them. I don't have a good example in my mind to show how such a custom estimator implementation can be useful (postgres' areasel and positionsel for geo data type are also just stub with hardcoded estimate at this point).

Regarding this issue, perhaps it's sufficient to close this as is. At least other people who are interested in such a use case can find the example below from the isslues list? What do you think?

Estimates when the function return 0.0

pgfirestore=# explain select * from fs_documents where properties #< fs_number_from_integer(100);
WARNING:  operator oid: Oid(127902)
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on fs_documents  (cost=0.00..23.20 rows=1 width=64)
   Filter: (properties #< fs_number_from_integer(100))
(2 rows)

Estimates when the function return 1.0

pgfirestore=# explain select * from fs_documents where properties #< fs_number_from_integer(100);
WARNING:  operator oid: Oid(127972)
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on fs_documents  (cost=0.00..23.20 rows=880 width=64)
   Filter: (properties #< fs_number_from_integer(100))
(2 rows)