supabase / wrappers

Postgres Foreign Data Wrapper development framework in Rust.
https://supabase.github.io/wrappers/
Apache License 2.0
469 stars 47 forks source link

Stripe FDW randomly returns no data #237

Open vincenzon opened 4 months ago

vincenzon commented 4 months ago

I am querying stripe FDW tables using an rpc call using supabase-js from my web app. About half the time, the query returns no results. It should return 2 rows, and does so about half the time. I can reproduce it reliably by just refreshing the page. It is not obvious when it will return results and when not. It may be a function of how quickly I refresh, but isn't always. I can refresh after waiting several minutes, get no results, refresh again and get results as well as vice-versa. There isn't any rhyme or reason to it as far as I can tell. When it returns no results, it does not have an error, just an empty result: {data: [], error: null}.

Supabase support is telling me to use pg_cron and to build what is effectively a materialized view of the stripe FDW data within supabase. If that is the recommended workflow, I think it should be documented as such.

Here is the rpc call with the query, it joins the product and price tables. There are two products, and those were populated once a while ago and have not changed.

 create or replace function public.get_subscription_prices()
returns table (
    name text,
    product_id text,
    price_id text, 
    price real
)
language plpgsql security definer as
$$
#variable_conflict use_column
begin
    set statement_timeout to 60000;
    return QUERY
        select p.name, p.id as product_id, q.id as price_id, (q.unit_amount / 100.0)::real as price
        from stripe.products p
        left join stripe.prices q on p.id = q.product
        where p.active and q.active;
end
$$;
imor commented 4 months ago

What happens if you run the following query directly in the database:

select p.name, p.id as product_id, q.id as price_id, (q.unit_amount / 100.0)::real as price
from stripe.products p
left join stripe.prices q on p.id = q.product
where p.active and q.active;

Does it also produce results intermittently? What about even simpler queries like:

select q.id, q.unit_amount, q.product, q.active from stripe.prices;

Or

select p.id, p.name, p.active from stripe.products p; 
vincenzon commented 4 months ago

If I run the full query (or the simple query) in the console, it always works correctly.

If I run the simple query from the client, it also fails about half the time.

It's possible I am not able to run the console queries fast enough so I don't think one can conclude conclusively that that points to a permission issue. When the query runs in the app, it is called twice in quick succession. Sometimes both fail, sometimes one fails, sometimes both succeed.

Matt

On Fri, Feb 9, 2024 at 2:05 AM Raminder Singh @.***> wrote:

What happens if you run the following query directly in the database:

select p.name, p.id as product_id, q.id as price_id, (q.unit_amount / 100.0)::real as pricefrom stripe.products pleft join stripe.prices q on p.id = q.productwhere p.active and q.active;

Does it also produce results intermittently? What about even simpler queries like:

select q.id, q.unit_amount, q.product, q.active from stripe.prices;

Or

select p.id, p.name, p.active from stripe.products p;

— Reply to this email directly, view it on GitHub https://github.com/supabase/wrappers/issues/237#issuecomment-1935429633, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGGVYL2IUSRFSLNOTKYKMADYSXDDVAVCNFSM6AAAAABDAADSUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGQZDSNRTGM . You are receiving this because you authored the thread.Message ID: @.***>

vincenzon commented 4 months ago

This issue looks very similar to mine and is pointing to a rate limit (possibly on the stripe side):

https://github.com/supabase/wrappers/issues/234

Matt

On Fri, Feb 9, 2024 at 7:31 AM Matt Calder @.***> wrote:

If I run the full query (or the simple query) in the console, it always works correctly.

If I run the simple query from the client, it also fails about half the time.

It's possible I am not able to run the console queries fast enough so I don't think one can conclude conclusively that that points to a permission issue. When the query runs in the app, it is called twice in quick succession. Sometimes both fail, sometimes one fails, sometimes both succeed.

Matt

On Fri, Feb 9, 2024 at 2:05 AM Raminder Singh @.***> wrote:

What happens if you run the following query directly in the database:

select p.name, p.id as product_id, q.id as price_id, (q.unit_amount / 100.0)::real as pricefrom stripe.products pleft join stripe.prices q on p.id = q.productwhere p.active and q.active;

Does it also produce results intermittently? What about even simpler queries like:

select q.id, q.unit_amount, q.product, q.active from stripe.prices;

Or

select p.id, p.name, p.active from stripe.products p;

— Reply to this email directly, view it on GitHub https://github.com/supabase/wrappers/issues/237#issuecomment-1935429633, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGGVYL2IUSRFSLNOTKYKMADYSXDDVAVCNFSM6AAAAABDAADSUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGQZDSNRTGM . You are receiving this because you authored the thread.Message ID: @.***>

imor commented 4 months ago

This looks like a bug in wrappers. If I run a query directly, e.g. select * from stripe.products it successfully returns results. But if I wrap the same query in a function, it only returns the result the first time. The difference in these is that the StripeFdw object is created afresh each time select * from stripe.products is run but not when the query is run from inside the function.

imor commented 4 months ago

I analyzed further and this bug is more involved than I thought. If the function with the query is called after select * from stripe.products it fails with this error: ERROR: required option 'object' is not specified.

To reproduce run the following:

create or replace function public.get_products()
returns table (
    product_id text
)
language plpgsql as
$$
begin
    return query
    select p.id as product_id from stripe.products p;
end
$$;

select * from public.get_products();
select * from stripe.products;
select * from public.get_products();--ERROR:  required option 'object' is not specified

The error occurs because the code assumes that a foreign table will only ever have one FdwState (see these lines). But this is not true when we have a foreign table query wrapped in a function and a direct query at the same time. While the select statement query always creates a new FdwState, the query in the function does not. This means the direct query is releasing the memory context in which the wrapped query's FdwState was allocated, resulting in a use after free bug. To fix this we need to ensure that a memory context holding an FdwState is not reset.

burmecia commented 4 months ago

When 1st time calling the function, postgres will cache the execution plan to improve query performance. So for the following calls, the FdwState won't be initialised again and execution plan is reused. This will cause unexpected results as the FdwState is reused without resetting its internal states.

For short-term solution, we can use dynamic query in the function to prevent postgres from using execution plan cache and re-create FdwState each time. For example,

create or replace function public.get_products()
returns table (
    product_id text
)
language plpgsql as
$$
begin
    return query
    execute 'select p.id as product_id from stripe.products p';  -- use dynamic query here
end
$$;

For long-term solution, we might need to find a good way to reuse FdwState during each function call and also don't interfere with other normal queries.

vincenzon commented 4 months ago

The dynamic query is indeed working. Thank you. For me this is closed, but I'll leave it to you to decide whether to keep it open to track the long-term issue.

Thanks for this, looks like it took some serious work.

imor commented 3 months ago

Although we have a good workaround (using a dynamic query), reopening this to track the use-after-free bug that we still need to fix.