supabase / edge-runtime

A server based on Deno runtime, capable of running JavaScript, TypeScript, and WASM services.
MIT License
691 stars 62 forks source link

FunctionsFetchError: Failed to send a request to the Edge Function - effecting 1% of invocations #263

Open williamlmao opened 9 months ago

williamlmao commented 9 months ago

Bug report

Describe the bug

We are getting intermittent "Failed to send a request to the Edge Function" errors. This bug is quite hard to reproduce. I have never been able to reproduce it locally, and 98.5% of function invocations go through just fine. For context, we've seen this error 1.3k times out of 72,474 invocations. I've had users report that they are getting this error, but then when they retry it goes through fine the next time.

My big questions are:

  1. Is it normal to see this error intermittently in supabase edge runtime? Is this just a fact of life, is this a bug, or is something wrong on our end? It's very hard to tell from my end.
  2. If it's likely that something is wrong on our end, I'd love any suggestions of how to debug. The request body we send for this function is very simple (just a couple IDs and a text input) so there is not much variability. Having a hard time identifying a pattern especially since the error is not very descriptive.

The stack trace we see in sentry is this:

 // 2. client-level headers
                    // 3. default Content-Type header
                    headers: Object.assign(Object.assign(Object.assign({}, _headers), this.headers), headers),
                    body,
                }).catch((fetchError) => {
                    throw new FunctionsFetchError(fetchError);
                });
                const isRelayError = response.headers.get('x-relay-error');
                if (isRelayError && isRelayError === 'true') {
                    throw new FunctionsRelayError(response);
                }

To Reproduce

I can't even reproduce this locally, so I can't provide reproduction steps.

Additional Context

We are running supabase CLI 1.142.1 and "@supabase/supabase-js": "2.39.1",

Please let me know if there's anything else I can provide that would be helpful.

Mykyta-Chernenko commented 9 months ago

I have exactly the same issue, around 2% of the requests fail with this error. I've been talking with the support for a month but they haven't managed to resolve the issue

sebestindragos commented 8 months ago

There's another related thread here which I don't understand why it was closed.

I'm also facing this error and wondering what a possible solution would be. I recently started using backoff on the client side to retry failed calls, but only helped to some degree. Still facing errors after all retries have been consumed.

L.E. I was able to reproduce this locally by shutting down the docker container for the edge runtime:

image

Although the error message is a bit different, I've been seeing both of them in production which I think are related:

Also seeing errors when fetching DB object (so not via edge functions, but via the normal DB REST API): TypeError: fetch failed - TypeError: fetch failed.

Seriously considering moving away from Supabase. I really love the product, but if it's this unreliable in a production environment then it's not really usable.

laktek commented 7 months ago

We try to figure out the causes for these intermittent failures. But these request failures can happen due to multiple reasons:

I think it's best to have some error handling and retrying mechanism implemented in the client calling the edge function to reduce the chances of them erroring out (we'll consider building some of this logic into supabase-js itself)

Mykyta-Chernenko commented 7 months ago

We try to figure out the causes for these intermittent failures. But these request failures can happen due to multiple reasons:

  • Your client's firewall blocking requests
  • Cloudflare CDN (which Supabase uses in production) blocking production traffic
  • Edge Runtime failing to respond due to an internal issue (this is the part we can focus on and try to reduce from happening)
  • Your edge function server implementation having an issue (usually these would be logged in Function logs)

I think it's best to have some error handling and retrying mechanism implemented in the client calling the edge function to reduce the chances of them erroring out (we'll consider building some of this logic into supabase-js itself)

I retry for 3 times with backoff of 2, 4, 8 seconds. Most of the time the issue goes away, but around 0.5% of the requests still fail.

sebestindragos commented 7 months ago

@laktek as mentioned already using backoff retries, but still seeing issues. And IMO that's a terrible solution to suggest for users (hey, you should just retry requests). Because failures can still happen and products are loosing customers because of them.

sebestindragos commented 7 months ago

@laktek here is another type of error I was able to catch recently. It's the html code of a cloudflare page.

I copy pasted it into an html file and it looks like this:

image image
<html class=\"no-js ie7 oldie\" lang=\"en-US\"> <![endif]-->\n<!--[if IE 8]>    <html class=\"no-js ie8 oldie\" lang=\"en-US\"> <![endif]-->\n<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en-US\"> <!--<![endif]-->\n<head>\n\n\n<title>vnawaforiamopaudfefi.supabase.co | 520: Web server is returning an unknown error</title>\n<meta charset=\"UTF-8\" />\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\" />\n<meta name=\"robots\" content=\"noindex, nofollow\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1\" />\n<link rel=\"stylesheet\" id=\"cf_styles-css\" href=\"/cdn-cgi/styles/main.css\" />\n\n\n</head>\n<body>\n<div id=\"cf-wrapper\">\n    <div id=\"cf-error-details\" class=\"p-0\">\n        <header class=\"mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-8\">\n            <h1 class=\"inline-block sm:block sm:mb-2 font-light text-60 lg:text-4xl text-black-dark leading-tight mr-2\">\n              <span class=\"inline-block\">Web server is returning an unknown error</span>\n              <span class=\"code-label\">Error code 520</span>\n            </h1>\n            <div>\n               Visit <a href=\"https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_520&utm_campaign=vnawaforiamopaudfefi.supabase.co\" target=\"_blank\" rel=\"noopener noreferrer\">cloudflare.com</a> for more information.\n            </div>\n            <div class=\"mt-3\">2024-04-10 04:02:07 UTC</div>\n        </header>\n        <div class=\"my-8 bg-gradient-gray\">\n            <div class=\"w-240 lg:w-full mx-auto\">\n                <div class=\"clearfix md:px-8\">\n                  \n<div id=\"cf-browser-status\" class=\" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center\">\n  <div class=\"relative mb-10 md:m-0\">\n    \n    <span class=\"cf-icon-browser block md:hidden h-20 bg-center bg-no-repeat\"></span>\n    <span class=\"cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4\"></span>\n    \n  </div>\n  <span class=\"md:block w-full truncate\">You</span>\n  <h3 class=\"md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3\">\n    \n    Browser\n    \n  </h3>\n  <span class=\"leading-1.3 text-2xl text-green-success\">Working</span>\n</div>\n\n<div id=\"cf-cloudflare-status\" class=\" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center\">\n  <div class=\"relative mb-10 md:m-0\">\n    <a href=\"https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_520&utm_campaign=vnawaforiamopaudfefi.supabase.co\" target=\"_blank\" rel=\"noopener noreferrer\">\n    <span class=\"cf-icon-cloud block md:hidden h-20 bg-center bg-no-repeat\"></span>\n    <span class=\"cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4\"></span>\n    </a>\n  </div>\n  <span class=\"md:block w-full truncate\">Seattle</span>\n  <h3 class=\"md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3\">\n    <a href=\"https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_520&utm_campaign=vnawaforiamopaudfefi.supabase.co\" target=\"_blank\" rel=\"noopener noreferrer\">\n    Cloudflare\n    </a>\n  </h3>\n  <span class=\"leading-1.3 text-2xl text-green-success\">Working</span>\n</div>\n\n<div id=\"cf-host-status\" class=\"cf-error-source relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center\">\n  <div class=\"relative mb-10 md:m-0\">\n    \n    <span class=\"cf-icon-server block md:hidden h-20 bg-center bg-no-repeat\"></span>\n    <span class=\"cf-icon-error w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4\"></span>\n    \n  </div>\n  <span class=\"md:block w-full truncate\">vnawaforiamopaudfefi.supabase.co</span>\n  <h3 class=\"md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3\">\n    \n    Host\n    \n  </h3>\n  <span class=\"leading-1.3 text-2xl text-red-error\">Error</span>\n</div>\n\n                </div>\n            </div>\n        </div>\n\n        <div class=\"w-240 lg:w-full mx-auto mb-8 lg:px-8\">\n            <div class=\"clearfix\">\n                <div class=\"w-1/2 md:w-full float-left pr-6 md:pb-10 md:pr-0 leading-relaxed\">\n                    <h2 class=\"text-3xl font-normal leading-1.3 mb-4\">What happened?</h2>\n                    <p>There is an unknown connection issue between Cloudflare and the origin web server. As a result, the web page can not be displayed.</p>\n                </div>\n                <div class=\"w-1/2 md:w-full float-left leading-relaxed\">\n                    <h2 class=\"text-3xl font-normal leading-1.3 mb-4\">What can I do?</h2>\n                          <h3 class=\"text-15 font-semibold mb-2\">If you are a visitor of this website:</h3>\n      <p class=\"mb-6\">Please try again in a few minutes.</p>\n\n      <h3 class=\"text-15 font-semibold mb-2\">If you are the owner of this website:</h3>\n      <p><span>There is an issue between Cloudflare's cache and your origin web server. Cloudflare monitors for these errors and automatically investigates the cause. To help support the investigation, you can pull the corresponding error log from your web server and submit it our support team.  Please include the Ray ID (which is at the bottom of this error page).</span> <a rel=\"noopener noreferrer\" href=\"https://support.cloudflare.com/hc/en-us/articles/200171936-Error-520\">Additional troubleshooting resources</a>.</p>\n                </div>\n            </div>\n        </div>\n\n        <div class=\"cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300\">\n  <p class=\"text-13\">\n    <span class=\"cf-footer-item sm:block sm:mb-1\">Cloudflare Ray ID: <strong class=\"font-semibold\">871fd6a14523c74d</strong></span>\n    <span class=\"cf-footer-separator sm:hidden\">&bull;</span>\n    <span id=\"cf-footer-item-ip\" class=\"cf-footer-item hidden sm:block sm:mb-1\">\n      Your IP:\n      <button type=\"button\" id=\"cf-footer-ip-reveal\" class=\"cf-footer-ip-reveal-btn\">Click to reveal</button>\n      <span class=\"hidden\" id=\"cf-footer-ip\">35.167.165.194</span>\n      <span class=\"cf-footer-separator sm:hidden\">&bull;</span>\n    </span>\n    <span class=\"cf-footer-item sm:block sm:mb-1\"><span>Performance &amp; security by</span> <a rel=\"noopener noreferrer\" href=\"https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_520&utm_campaign=vnawaforiamopaudfefi.supabase.co\" id=\"brand_link\" target=\"_blank\">Cloudflare</a></span>\n    \n  </p>\n  <script>(function(){function d(){var b=a.getElementById(\"cf-footer-item-ip\"),c=a.getElementById(\"cf-footer-ip-reveal\");b&&\"classList\"in b&&(b.classList.remove(\"hidden\"),c.addEventListener(\"click\",function(){c.classList.add(\"hidden\");a.getElementById(\"cf-footer-ip\").classList.remove(\"hidden\")}))}var a=document;document.addEventListener&&a.addEventListener(\"DOMContentLoaded\",d)})();</script>\n</div><!-- /.error-footer -->\n\n\n    </div>\n</div>\n</body>\n</html>

I think it's pretty clear from this that the issue is on Supabase's end. Whatever server you have running the edge functions runtime is crashing.

williamlmao commented 7 months ago

@laktek, thanks for your response.

The biggest issue here is that the error codes don't give any information on which one of the options you listed was the reason for the error.

From our perspective

We already have retries and error handling built into our site, but that doesn't fix the problem, and it doesn't seem like there is a path forward on my end to solve the problem.

I hope you can find how incredible frustrating this is on our end. We really love supabase, and would really love to stay on supabase edge functions, but I'm starting to think we have to move off unless you are able to indicate to us that this is something that you can solve within the next month or so.

williamlmao commented 7 months ago

Another note to add is, it is definitely something to do with the cloudflare CDN. Every invocation 502 error we have has cloudflare listed in the response metadata.

{ "headers": [ { "content_length": "524", "content_type": "text/html", "date": "Tue, 16 Apr 2024 18:48:07 GMT", "server": "cloudflare", "vary": null, "x_sb_edge_region": null, "x_served_by": null } ], "status_code": 502 }

hkrutzer commented 7 months ago

Perhaps some sort of request ID, or OpenTracing header, could be added, to make it easier to find logs corresponding to the failing requests. Similar to e.g. the CF-Ray-Id header from Cloudflare.

evelant commented 6 months ago

I had the same issue. 2-4% of all requests just failed for no apparent reason at all. I ended up switching to Bun hosted on fly.io. It was really easy and the experience is way better. It just works, no hassles with broken Deno tooling (monorepos are practically impossible), bugs in Deno, missing features in Deno, random failures, etc.

laktek commented 5 months ago

Are y'all still experiencing random errors? We've made some stability improvements in platform which we believe should help reduce the random 502 errors.

cspace001 commented 5 months ago

Hi Im getting this (similar) error when doing password reset on my webapp.

laktek commented 5 months ago

@cspace001 password reset using Edge Functions?

alexbriannaughton commented 2 months ago

@laktek I still seem to get this error more often than I'd like in my pg_cron --> pg_net --> edge function invocation flow.

mansueli commented 2 months ago

I've improved the example in pg_cron for calling edge functions so you raise the timeout value:

select
  cron.schedule(
    'invoke-function-every-half-minute',
    '30 seconds',
    $$
    select
      net.http_post(
          url:='https://project-ref.supabase.co/functions/v1/function-name',
          headers:=jsonb_build_object('Content-Type','application/json', 'Authorization', 'Bearer ' || 'YOUR_ANON_KEY'),
          body:=jsonb_build_object('time', now() ),
          timeout_milliseconds:=5000
      ) as request_id;
    $$
  );

https://supabase.com/docs/guides/database/extensions/pg_cron#invoke-supabase-edge-function-every-30-seconds

laktek commented 2 months ago

@alexbriannaughton Can you try increasing the timeouts in pg_net requests as mentioned in above example from @mansueli ?

alexbriannaughton commented 2 months ago

@laktek I had actually already set the timeout_milliseconds to 4500 to troubleshoot before posting here and opening a support request.

I will note that I haven't had the issue for the last 24 hours, but I'm not sure what I did on my end to make it stop!

Edit: had another one the following day.

RedChops commented 2 months ago

We've been dealing with this for about a year now. It's hard to tell the exact percentage of failing requests but it seems like 2% is probably accurate.

We've built in backoff on the client side all over the place but that's more annoying to do on function-to-function calls and is also a terrible end user experience.

These days the errors seem to be mostly 502, 503, and 520. Just today we got this one:

image

in our payment processing code. There are no other logs I can look at, this all seems to be squarely a Supabase issue and it hasn't really seemed to improve much over time

laktek commented 2 months ago

@RedChops, can you open a support ticket (via https://supabase.help) with the name of the project ID and functions that are receiving these errors? I'll investigate internally to find causes for these errors.

JeongJuhyeon commented 2 months ago

@RedChops, can you open a support ticket (via https://supabase.help) with the name of the project ID and functions that are receiving these errors? I'll investigate internally to find causes for these errors.

We've been seeing the same for multiple weeks (we're seeing ~5% 502s), opened a support ticket there, ticket ID 3663152744. Includes the details of both a 200 and a 502 to the same edge function.

We only get 502, no other 5XX codes.