Open nathanaeng opened 6 days ago
I have encountered a similar issue when trying to call an edge function multiple times concurrently. In my case, making a lot of calls resulted in InvalidWorkerCreation
errors or 502 errors. It seems that the scaling ability of edge functions might be limited and this significantly impacts performance when concurrent requests spike.
I feel like other serverless functions can handle concurrent requests with ease, yet edge functions can't even handle 50? Is Supabase not equipped to handle more than 50 concurrent requests? It seems as if the edge function is attempting to create a worker for every single request rather than queuing or using some implementation to resolve concurrency on a large scale.
Hello @nathanaeng and @ethan-dinh
I am not a member of the Supabase team that works on Supabase Edge Functions, but as the edge runtime maintainer, I'm sorry I didn't meet your expectations 😞
With the user script code and bash script you posted in the description and assuming you're using default edge runtime policy settings in supabase/cli then, I can explain why the edge runtime is showing such low request throughput.
The edge runtime has three main scheduling policies(per_worker, per_request, oneshot) for workers, and for developers convenience, supabase/cli defaults to whichever of these scheduling policies is not used by Supabase Edge Functions. (aka. oneshot policy)
Unlike the other policies, the oneshot
policy does not reuse workers but rather creates a new worker and forwards a request to it, even if they have the same service path.
The reason supabase/cli chose this policy as the default is that the source code can be changed by developers at any time, so that the next request will reflect the changed source code.
So it is not used in production(and Supabase Edge Functions) because it is highly inefficient for the reasons described above.
If you change the policy, I think you'll probably get a different result.
I was able to reproduce your issue exactly locally on the oneshot
policy using your code, but I was also able to confirm that the per_worker
policy is not affected by this issue.
Of course, my experience doesn't guarantee that you won't have the same issue with Supabase Edge Functions.
Today, I came across an author on Reddit discussing this same topic, and it seemed that the author was also experiencing these issues with Supabase Edge Functions.
My expectation is that these issues should be handled well by the per_worker
policy, but it looks like sometimes it's not able to properly forward the many request traffic to the workers and just gives up. (Forgive me, I have very limited visibility for Edge Functions because I am not a member of the Supabase team).
I have opened PR-382 to better handle this situation, and once this is merged, they will be able to implement more specific request scheduling on top of the per_worker policy, which I believe will mitigate these issues.
I will put this on my watchlist and will let you guys know if there are any updates on this issue in the future.
Have a great day!
Thanks for the detailed response! Yep, I have looked into the per_worker policy and while it might work fine for the simple edge function I provided above, it was failing for a more complex edge function that performs a read, text embedding, and write. I can't recall how many concurrent requests it was able to handle, it might have been a bit more than oneshot but it was still underwhelming unfortunately. Additionally, I was able to replicate this error on my remote DB (Supabase hosted) which makes me think it's not just a local hosting issue. Thanks for helping though!
Hello @nyannyacha , thanks for your detailed response. As someone who self-hosts edge functions separately (not together with supabase docker compose), where should I go about changing the policies you mentioned? I suspect it is in the main function index.ts with forceCreate = true or false but I am not sure and I am still getting those 502 errors after 30-50 concurrent requests even with the forceCreate = false option. Can you help me figure out some other configurations in the main function where I can optimize for better scaling performance? I am running it in multiple replicas in my K8s deployment but the replicas still cannot pass the load test because the edge runtime container stop responding to requests and return 502 with the above error after a few concurrent requests.
Bug report
Describe the bug
Making concurrent requests to a Supabase edge function will result in InvalidWorkerCreation errors or 502 errors.
To Reproduce
Steps to reproduce the behavior, please provide code snippets or a repository:
supabase functions new test_concurrency
. Here is an example of a function I have (I realize the createClient is not used):console.log("Hello from Functions!")
Deno.serve(async (req) => { const supabaseClient = createClient( Deno.env.get('SUPABASE_URL') ?? '', Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') ?? '', ); const { name } = await req.json() const data = { message:
Hello ${name}!
, }return new Response( JSON.stringify(data), { headers: { "Content-Type": "application/json" } }, ) })
!/bin/bash
seq 1 200 | xargs -n1 -P0 -I{} curl -L -X POST 'http://localhost:54321/functions/v1/test_concurrency' -H 'Authorization: Bearer SERVICE_ROLE_KEY' --data '{"name":"Example"}'
InvalidWorkerCreation: worker did not respond in time at async UserWorker.create (ext:sb_user_workers/user_workers.js:145:15) at async Object.handler (file:///root/index.ts:154:22) at async respond (ext:sb_core_main_js/js/http.js:163:14) { name: "InvalidWorkerCreation" }
{"code":"BOOT_ERROR","message":"Worker failed to boot (please check logs)"}