Open devdras opened 1 year ago
Thank you for posting this. I'm receiving a similar error.
Surprisingly when I create 10 records at the same time it fires the function for each 10 records...
INSERT INTO public.profiles (id, first_name, last_name, stripe_customer_id, email)
SELECT u.id, NULL, NULL, NULL, u.email
FROM auth.users u
LEFT JOIN public.profiles p ON u.id = p.id
WHERE p.id IS NULL
LIMIT 10;
However, if I fire the function for 100 records...
INSERT INTO public.profiles (id, first_name, last_name, stripe_customer_id, email)
SELECT u.id, NULL, NULL, NULL, u.email
FROM auth.users u
LEFT JOIN public.profiles p ON u.id = p.id
WHERE p.id IS NULL
LIMIT 100;
It fails
Yup, basically the webhook behaviour is flakey. I haven't tried it recently. I wanted to use this to build an event driven architecture but can't without it being reliable. Ended up using QStash as a replacement for the webhook. Would love Supabase to have some kind of full featured event queue like SQS/Eventbridge/Service Bus.
We've tested pg_net sending 1K inserts per second, this to a local nginx server though.
@devdras @danrasmuson Which webhook service are you using? Likely it's rate limiting the requests.
We've tested pg_net sending 1K inserts per second, this to a local nginx server though.
@devdras @danrasmuson Which webhook service are you using? Likely it's rate limiting the requests.
Im using vercel with serverless functions
We're experiencing the same problem. When inserting records in sequence all the webhooks are received. But if we insert 30 rows in parallel none of the webhooks send and the net/_http_response
table is full of Timeout was reached
errors.
@matt-aitken Are you using vercel serverless functions too?
@steve-chavez We're able to reliably reproduce it based on the number of new entries that get bulk added to the table—have been in touch with your support team about this as well, but unable to resolve it.
Happy to prepare a screen-recording or do a walk through if helpful.
The lambda function does not record any invocations if the number of records inserted is larger than >150
@lohrmann That might be related to lambdas rate limiting https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#api-requests
We're able to reliably reproduce it based on the number of new entries that get bulk added to the table
By default there are 200 requests per iteration https://github.com/supabase/pg_net/blob/master/src/worker.c#L479-L483
I'll make this parameter user configurable.
Curl has:
curl --rate 2/s -O https://example.com/[1-100].jpg
https://everything.curl.dev/usingcurl/transfers/request-rate#examples
I think that should be the final solution here.
Note: the above is not available on libcurl
.
Sorry for the slow response on this. No, we're receiving webhooks to AWS EC2 (a long-running server).
There's nothing in our server logs where we receive requests from Supabase (when inserting many rows) and the net/_http_response
table is full of Timeout was reached
errors.
We (Trigger.dev) make it easy for our customers to subscribe to these Supabase database webhooks and quite a few people are hitting this problem now.
We're experiencing the same problem. When inserting records in sequence all the webhooks are received. But if we insert 30 rows in parallel none of the webhooks send and the
net/_http_response
table is full ofTimeout was reached
errors.
I have the same issue. I'm using Google Cloud Functions v2 (Cloud Run). Table is full of Timeout was reached
, but in Google Cloud Monitoring don't have any logs.
I'm running into this issue as well. I spun up a clean environment to reproduce, and did some testing. Supabase project qutskxfebtlgwyzsonvd
. The setup is:
users
tablequeue
tablequeue
for each userqueue
insert to an HTTPS endpoint, an AWS API Gateway 200
response (typically in <80ms)sleep(5)
I ran a handful of tests (spreadsheet here, screenshot below). It looks like a pretty linear relationship between the number of requests (in my case, the number of users
for which the cron job inserts items into the queue
) and the webhook timeout configuration. By my math, the webhook needs about 33.33ms per item to process. If only 20ms per item is available within the webhook timeout configuration, most or all of the events fail to be sent to the API Gateway.
Unlike the note from @lohrmann above, I did see some partial successes.
@reedshea Thanks for the detailed information.
@TheOtherBrian1 Shared a reproducible example: https://gist.github.com/TheOtherBrian1/e479426f0070d5f0879483cf59f85dfb
The setup is basically:
CREATE TABLE webhook_test (
id SERIAL,
sample text
);
CREATE OR REPLACE FUNCTION debugging_example()
RETURNS TRIGGER
AS $$
DECLARE
-- Store payload data
row_data_var JSONB;
row_id INT;
BEGIN
-- Retrieve row data and convert to JSON
SELECT TO_JSONB(NEW.*) INTO row_data_var;
-- Initiate HTTP POST request to URL
SELECT
net.http_post(
url := 'https://postman-echo.com/post',
-- Use row data as payload
body := row_data_var
) INTO row_id;
RAISE LOG
'LOGGING WEBHOOK |
The Webhook was called after %s |
ID = %s |
Logging an entire row as JSON (%)', (SELECT NOW()), row_id, row_data_var;
-- Return the row data to inspect
RETURN NEW;
-- Handle exceptions here if needed
EXCEPTION
WHEN others THEN
RAISE EXCEPTION 'LOGGING WEBHOOK FAILED: An error occurred: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER webhook
AFTER INSERT ON webhook_test
FOR EACH ROW
EXECUTE FUNCTION debugging_example();
CREATE OR REPLACE FUNCTION insert_test_data(num_rows INT)
RETURNS VOID
AS $$
DECLARE
i INT;
BEGIN
FOR i IN 1..num_rows LOOP
INSERT INTO webhook_test(sample) VALUES (i || 'Call');
END LOOP;
END;
$$ LANGUAGE plpgsql;
Then execute a request:
INSERT INTO webhook_test (sample)
VALUES ('test value');
Then a batch of requests:
SELECT insert_test_data(100);
Locally, all requests succeed fast:
select count(*) from net._http_response where error_msg is null;
-[ RECORD 1 ]
count | 101
select count(*) from net._http_response where error_msg is not null;
-[ RECORD 1 ]
count | 0
But, for some reason, on Supabase cloud only the single request succeeds and the batched ones fail:
select count(*) from net._http_response where error_msg is null;
count |
---|
1 |
select count(*) from net._http_response where error_msg is not null;
count |
---|
100 |
Will debug this further.
I should add that Supabase is on 0.7.1, the latest version is 0.7.3. I've tried 0.7.1 locally and the result is the same.
By increasing the default timeout (2 seconds
) to 4 seconds
, all the above requests succeed on Supabase cloud:
net.http_post(
url := 'https://postman-echo.com/post',
-- Use row data as payload
body := row_data_var,
timeout_milliseconds := 4000
) INTO row_id;
SELECT insert_test_data(100);
select count(*) from net._http_response where error_msg is null;
count |
---|
100 |
Will increase the default to 5 seconds.
Will increase the default to 5 seconds.
Is that setting the timeout for the whole batch? It seems based on @reedshea's graph, the whole batch needs enough time per call. Or is he setting some other timeout?
I just ran into this with my staging environment. I updated a table which caused 121 events to need to be published, all in one transaction (single UPDATE statement). All 121 http_post
requests fail with timeout error after ~2 seconds. I bumped the timeout_milliseconds
to 4000 but that wasn't sufficient. I had to bump it to 6000 to allow all of them to succeed. I'm submitting task requests to Inngest, which responds really fast.
I think the right thing is to multiply the timeout by the number of requests per batch if it is not possible to make it actually be a per-request timeout. Passing the timeout on a per-request basis gives the developer the expectation that it applies to that one request, not the entire batch in which that request happens to be included by the net
extension batching algorithm.
What happens if different requests get passed different timeouts? Which one is used for that batch?
@vickkhera I wasn't actually aware of the cause of the bug - I was only aware of how to reproduce it - but you seem to have noticed that the timeout is caused when a batch (default 500) does not complete in the set timeout period. A later batch would actually succeed if its queue were smaller or the timeout were higher.
I assumed that the timeout setting was for an individual cURL requests and not the entire batch. @steve-chavez, is this the intended behavior?
The timeouts are per-request. Will add a test to confirm soon.
The timeouts are per-request. Will add a test to confirm soon.
There's no way my requests were taking 2s each. The whole batch timed out in ~2-3 seconds.
We're running into this same issue. "net._http_response" table shows the "Timeout was reached" error yet the "timed_out" column remains null. This is happening with batches of 8 or less, and it seemed like after the initial failure subsequent inserts into the table with the trigger failed even with a batch of 1.
I'm having this problem as well. Can reproduce it based on the number of new entries that get bulk added to the table. Around 10 work, more than that get a significant % of errors
All using pgnet and supabase webhooks. Would love if theres some solution? Using the lambda to then load SQS. Thinking of skipping this step now...
I'm having this problem as well. Can reproduce it based on the number of new entries that get bulk added to the table. Around 10 work, more than that get a significant % of errors
- The webhook trigger is calling an AWS Lambda function (via Gateway).
- The lambda function does not record any invocations if the number of records inserted
- It's not the case that only some of the invocations don't come through
- It's not the case that the invocation reaches the Lambda function and the webhook times out before the function responds
- Other times it doesn't time out and just returns a 500 with "Internal Server Error"
- In supabase, the _http_response table logs everything
All using pgnet and supabase webhooks. Would love if theres some solution? Using the lambda to then load SQS. Thinking of skipping this step now...
I am doing something similar, except API Gateway populates SQS which then calls a lambda function. Worked fine on small data sets but when I had to make over 100 calls I saw similar behavior as to what you (and others) describe. I set the timeout to 5 seconds and it is working now for me, although that feels very fragile. Who knows if it will work if I have a big burst of calls I need to make. The UI says 5000 ms is the max timeout, but I believe you can set it higher than that if you just create the function via SQL editor, though I'm not sure if that has any side effects.
I'm having this problem as well. Can reproduce it based on the number of new entries that get bulk added to the table. Around 10 work, more than that get a significant % of errors
- The webhook trigger is calling an AWS Lambda function (via Gateway).
- The lambda function does not record any invocations if the number of records inserted
- It's not the case that only some of the invocations don't come through
- It's not the case that the invocation reaches the Lambda function and the webhook times out before the function responds
- Other times it doesn't time out and just returns a 500 with "Internal Server Error"
- In supabase, the _http_response table logs everything
All using pgnet and supabase webhooks. Would love if theres some solution? Using the lambda to then load SQS. Thinking of skipping this step now...
I am doing something similar, except API Gateway populates SQS which then calls a lambda function. Worked fine on small data sets but when I had to make over 100 calls I saw similar behavior as to what you (and others) describe. I set the timeout to 5 seconds and it is working now for me, although that feels very fragile. Who knows if it will work if I have a big burst of calls I need to make. The UI says 5000 ms is the max timeout, but I believe you can set it higher than that if you just create the function via SQL editor, though I'm not sure if that has any side effects.
yea i have it set to 5000ms and still having the problem. Agree it seemed fragile regardless. would love to know if doing via SQL Editor solves?
Otherwise gonna basically skip the webhook altogether as can't risk it in production
@waylon999 You can set the timeout to greater than 5000ms. That's a limitation imposed by the UI. You can write the webhooks directly in pg_net or use the supabase_functions.http_request helper function.
This bug has been open for a year, and it's really frustrating that issues like this are not properly addressed by the Supabase team. They are adding other advanced features like branching, but the core remains unstable, and I don't see a significant effort to fix that. I migrated a relatively small project from Firestore/Firebase to Supabase because I believed in their marketing, and now I really regret this decision. Currently, I'm paying more for significantly worse quality:
Webhooks are unstable. In my previous architecture, I heavily relied on Firestore triggers to perform some operations in the background. Now, they can randomly fail without any option to retry, which is a HUGE problem when something needs to happen in the background of our app.
Realtime functionality is useless; it only works fine when you have an active tab in the browser. Then, you need to figure out the whole mechanism for refetching data and reconnecting after you go from inactive. I don't even want to mention how painful this is on mobile (React Native). With Firebase, I didn't even have to think about it; it just worked out of the box.
Supervisor / pg_bouncer - Random failures when connecting to the DB from cloud functions / AWS Lambda (Connection terminated unexpectedly).
Upgrading your db? Okay, our app will be unavailable for minutes or hours, and then you need to figure out why something doesn't work.
Storage - You can upload from RN, but first, you need to convert the entire file into base64 and then use ArrayBuffer.
But yes, Supabase is open-sourced. I guess I'll tell my clients - our app doesn't work properly, but at least you can read the source code.
@waylon999 You can set the timeout to greater than 5000ms. That's a limitation imposed by the UI. You can write the webhooks directly in pg_net or use the supabase_functions.http_request helper function.
does this solve the problem though? Seems fragile regardless of timeout (esp if unclear if timeout will be 5 seconds or 30)
Any new updates on this?
Any new updates on this?
pg_net has been super inconsistent. also tends to hang up sporadically. tried to build an event-driven app but decide to give up & rely on cron job instead
Is there any ETA on this? It's blocking me and I would need to do some workaround. Thanks!
Experiencing same issue. I increased timeout to 30s and all requests ended up with either "SSL connect error" or "Error in the HTTP2 framing layer".
In my table I have raw and parsed response from OpenAI. Going to try to rewrite the trigger with plain net.http_post
instead of supabase_functions.http_request
. Probably the payload is too big and I can shrink it by constructing it manually (I do not need OLD record). Another idea is to write statement trigger, so I have one request for batch update.
I'm unfortunately dealing with this as well. For me it's when there are 100 concurrent events (like bulk deleting 100 rows). I'm curious what alternatives you all are using for listening to postgres events and performing async actions? I saw someone mention QStash but not sure how that plugs in to the event flow since you would still need to make a pg_net request to your redis backend based on the event...right?
EDIT:
Has anyone implemented this as an alternative?
https://supabase.com/docs/guides/realtime/subscribing-to-database-changes
Theoretically I could just subscribe to the same tables in my backend and therefore only need to use 1 (or however many servers I deploy) persistent realtime connection for the same features? I suppose the issue would be concurrency since my two backend servers would receive the same event notification....
I'm unfortunately dealing with this as well. For me it's when there are 100 concurrent events (like bulk deleting 100 rows). I'm curious what alternatives you all are using for listening to postgres events and performing async actions? I saw someone mention QStash but not sure how that plugs in to the event flow since you would still need to make a pg_net request to your redis backend based on the event...right?
EDIT:
Has anyone implemented this as an alternative?
https://supabase.com/docs/guides/realtime/subscribing-to-database-changes
Theoretically I could just subscribe to the same tables in my backend and therefore only need to use 1 (or however many servers I deploy) persistent realtime connection for the same features? I suppose the issue would be concurrency since my two backend servers would receive the same event notification....
I just used a queue. I've used Upstash, BullMQ and Inngest. All are good. Whenever you update/insert the piece of data that you wanted to trigger, send an event to your queue. Your queue then does the thing you want to do. Basically my solution was to not use Supabase Webhooks. Imo they're useless without some kind of queue functionality backing them up.
@devdras Thanks for the quick response! I guess the issue that I'm having with that approach is that I need to listen for events from records that are deleted based on cascade events (say you have deleted an entity that has a fkey to a different table with onDelete CASCADE). Having to keep track of when a secondary entity is also removed when removing the original entity quickly becomes really messy.
That's why listening to the database row level events has been super powerful. A single source of truth that sends updates when a row is removed (whether directly or indirectly)
My guess is that I would need to use a KV store like Upstash or NATS to create a "lock" on an entity (based on the table and id) where you can only write to the queue if that key does not exist in the KV store. Maybe with having the key TTL being a couple minutes to remove from memory
Maybe that's what you're doing and now I'm understanding or not haha
Sorry to slightly derail the goal of this ticket but I hope it helps for people arriving at the same issue currently
@uncvrd I'd probably use a postgres function for that 🙂
I just used a queue. I've used Upstash, BullMQ and Inngest. All are good. Whenever you update/insert the piece of data that you wanted to trigger, send an event to your queue. Your queue then does the thing you want to do. Basically my solution was to not use Supabase Webhooks. Imo they're useless without some kind of queue functionality backing them up.
My problem comes from reprocessing my internal queue which feeds Inngest... If I notice a request did not get accepted by Inngest, I retry it, but this is done with a single SQL statement more or less SELECT resubmit_to_innngest(data) from queue_table where ...
. When there are too many pg_net seems to just drop them so I have to retry yet again!
At some point pg_net has to be more reliable so I don't have to break this up into submitting no more than some small number at a time.
@devdras hmm but then we're back to using pg_net
within the postgres function so I can make calls to external APIs (for example I need to delete Typesense search documents which are not a part of the Postgres database when certain rows are deleted directly or indirectly)
anyways, I don't want to derail the ticket anymore. But I'll keep searching for a solution
I was just testing using pg_net to realtime broadcast endpoint. A bulk insert on a table of 20 rows works. 40 rows does not with timeout for all. Simple SQL:
insert into cats (count)
select * from generate_series (2300,2350)
The trigger function call the realtime broadcast end point.
There is a 2nd issue in that it appears pg_net does not guarantee order. I'm assuming that it sends the http requests async (versus it is just async back to the SQL caller). Because of this rapidly sent http requests can arrive out of order at their destination.
Bug report
Describe the bug
I have a webhook configured to watch a table for inserts. On an insert a POST request should be fired to an API hosted on Vercel. This works when one insert operation is carried out. If multiple inserts are carried out: .insert([{message: 1}, {message: 2}, {message: 3}]) then the webhooks aren't fired at all. If instead we carry out multiple inserts by calling the Supabase API multiple times then the webhook is fired for only some of the inserts.
To Reproduce
Steps to reproduce the behavior, please provide code snippets or a repository:
Configure a webhook on a table to watch for insert operations and call an external API
Use the Supabase JS library to perform multiple inserts like:
let { data, error } = await supabase .from("example_table") .insert([{message: 1}, {message: 2}, {message: 3}, {message: 4}, {message: 5}, {message: 6}, etc ]) .select("message");
Check logs for external API to confirm it didn't receive any requests
See error
Something to note is that I was inserting 32 things at once.
Expected behavior
For each insert, I would expect the external API to be called. It would also help if there was a log in Supabase that showed what happens when the webhook is run.
Screenshots
If applicable, add screenshots to help explain your problem.
System information
Additional context
Add any other context about the problem here.