supabase / pg_net

A PostgreSQL extension that enables asynchronous (non-blocking) HTTP/HTTPS requests with SQL
https://supabase.github.io/pg_net
Apache License 2.0
242 stars 19 forks source link

Correct implementation of sync requests #154

Open steve-chavez opened 1 month ago

steve-chavez commented 1 month ago

Problem

Sync requests were previously supported using net.collect_response (deprecated); which is bad for performance, since it does an infinite loop plus pg_sleep.

https://github.com/supabase/pg_net/blob/28b26d86cb126fba42bb03de3e3f97cb9c986fe1/sql/pg_net.sql#L50-L76

Some simple use cases require sync requests (https://github.com/orgs/supabase/discussions/28771). While this is bad for performance, it can help some users get going.

Solution

Now that https://github.com/supabase/pg_net/pull/139 is merged, we can implement sync requests efficiently by waiting on the socket file descriptor. No need for loops or querying the response table.

This would be implemented as a PROCEDURE, so users can only invoke it with CALL, which prevents combining it with other queries on the same statement. This way it'll be more clear that this is a special (and slow) operation.

MathNerd28 commented 1 month ago

Would it be possible to implement a version that waits on multiple requests?

I'm using pg_cron jobs to import data from an external service using HTTP GET requests. pgsql-http is not an option for me, because these jobs can potentially execute hundreds of requests (in batches); I need them to run asynchronously.

Here's what I use right now:

CREATE OR REPLACE PROCEDURE sync.await_responses(
  request_ids bigint[],
  timeout interval
)
AS $$
DECLARE
  start_time CONSTANT timestamptz := now();
BEGIN
  DROP TABLE IF EXISTS remaining;
  CREATE TEMP TABLE remaining AS
  SELECT id
  FROM unnest(request_ids) requests(id);

  WHILE TRUE LOOP
    -- transactions prevent changes from appearing
    COMMIT;

    -- if a response has an error of any kind, abort
    IF EXISTS (
      SELECT 1
      FROM
        net._http_response response
        JOIN remaining request ON response.id = request.id
      WHERE
        response.timed_out OR
        response.error_msg IS NOT NULL OR
        NOT (response.status_code = 200 OR response.status_code = 304)
      LIMIT 1
    )
    THEN
      RAISE EXCEPTION 'server responded with error';
    END IF;

    -- if a request returned successfully, remove it from remaining
    DELETE FROM remaining
      USING net._http_response response
    WHERE remaining.id = response.id;

    -- if there are no more requests, return successfully
    IF (SELECT count(*) FROM remaining) = 0 THEN
      RETURN;
    END IF;

    -- timeout if this is taking too long
    IF now() - start_time >= timeout THEN
      RAISE EXCEPTION 'timeout while waiting for responses';
    END IF;

    -- wait for more responses
    PERFORM pg_sleep(0.1);
  END LOOP;
END;
$$ LANGUAGE plpgsql;
steve-chavez commented 1 month ago

@MathNerd28 Interesting use case. I guess we could have a function like:

call wait_response([1,2,3]) -- where the numbers are the request ids

But I'm thinking of a better way that doesn't involve waiting. Using the idea on https://github.com/supabase/pg_net/issues/62:

select cron.schedule('30 3 * * 6', $_$
    select net.http_get(
      url := 'http://domain.com',
    , success_cb := $$ 
        insert into import_table values ($1, $2, $3) -- $1=status, $2=headers, $3=body
    $$
    );  
$_$);

This way we don't have to add a sync function, which could be misused.

MathNerd28 commented 1 month ago

That's definitely a much cleaner solution that would work for most use cases.

I still think it's worth implementing a wait_responses procedure though, because there may still be times where you need to perform a single action after multiple requests, and don't want the hassle of maintaining and cleaning state. At least give the option, even if you add documentation that makes it clear this is not a recommended practice.

An example might be if your post-request action required a lock on a table, and you wanted to minimize the downtime for other write operations.