Open steve-chavez opened 1 month ago
Would it be possible to implement a version that waits on multiple requests?
I'm using pg_cron
jobs to import data from an external service using HTTP GET requests. pgsql-http
is not an option for me, because these jobs can potentially execute hundreds of requests (in batches); I need them to run asynchronously.
Here's what I use right now:
CREATE OR REPLACE PROCEDURE sync.await_responses(
request_ids bigint[],
timeout interval
)
AS $$
DECLARE
start_time CONSTANT timestamptz := now();
BEGIN
DROP TABLE IF EXISTS remaining;
CREATE TEMP TABLE remaining AS
SELECT id
FROM unnest(request_ids) requests(id);
WHILE TRUE LOOP
-- transactions prevent changes from appearing
COMMIT;
-- if a response has an error of any kind, abort
IF EXISTS (
SELECT 1
FROM
net._http_response response
JOIN remaining request ON response.id = request.id
WHERE
response.timed_out OR
response.error_msg IS NOT NULL OR
NOT (response.status_code = 200 OR response.status_code = 304)
LIMIT 1
)
THEN
RAISE EXCEPTION 'server responded with error';
END IF;
-- if a request returned successfully, remove it from remaining
DELETE FROM remaining
USING net._http_response response
WHERE remaining.id = response.id;
-- if there are no more requests, return successfully
IF (SELECT count(*) FROM remaining) = 0 THEN
RETURN;
END IF;
-- timeout if this is taking too long
IF now() - start_time >= timeout THEN
RAISE EXCEPTION 'timeout while waiting for responses';
END IF;
-- wait for more responses
PERFORM pg_sleep(0.1);
END LOOP;
END;
$$ LANGUAGE plpgsql;
@MathNerd28 Interesting use case. I guess we could have a function like:
call wait_response([1,2,3]) -- where the numbers are the request ids
But I'm thinking of a better way that doesn't involve waiting. Using the idea on https://github.com/supabase/pg_net/issues/62:
select cron.schedule('30 3 * * 6', $_$
select net.http_get(
url := 'http://domain.com',
, success_cb := $$
insert into import_table values ($1, $2, $3) -- $1=status, $2=headers, $3=body
$$
);
$_$);
This way we don't have to add a sync function, which could be misused.
That's definitely a much cleaner solution that would work for most use cases.
I still think it's worth implementing a wait_responses
procedure though, because there may still be times where you need to perform a single action after multiple requests, and don't want the hassle of maintaining and cleaning state. At least give the option, even if you add documentation that makes it clear this is not a recommended practice.
An example might be if your post-request action required a lock on a table, and you wanted to minimize the downtime for other write operations.
Problem
Sync requests were previously supported using
net.collect_response
(deprecated); which is bad for performance, since it does an infinite loop plus pg_sleep.https://github.com/supabase/pg_net/blob/28b26d86cb126fba42bb03de3e3f97cb9c986fe1/sql/pg_net.sql#L50-L76
Some simple use cases require sync requests (https://github.com/orgs/supabase/discussions/28771). While this is bad for performance, it can help some users get going.
Solution
Now that https://github.com/supabase/pg_net/pull/139 is merged, we can implement sync requests efficiently by waiting on the socket file descriptor. No need for loops or querying the response table.
This would be implemented as a PROCEDURE, so users can only invoke it with CALL, which prevents combining it with other queries on the same statement. This way it'll be more clear that this is a special (and slow) operation.